pandoc-reader/pandoc_reader.py

import logging
import subprocess

from pelican import signals
from pelican.readers import BaseReader
from pelican.utils import pelican_open
import urllib.parse


try:
    import yaml
except ImportError:
    yaml = None
    logging.warning("YAML is not installed; the YAML reader will not work.")


class PandocReader(BaseReader):
    enabled = True
    file_extensions = ['md', 'markdown', 'mkd', 'mdown']

    def _get_meta_and_content(self, text):
        metadata = {}

        use_YAML = text[0] == '---' and yaml is not None
        if use_YAML:
            # Load the data we need to parse
            to_parse = []
            for i, line in enumerate(text[1:]):
                # When we find a terminator (`---` or `...`), stop.
                if line in ('---', '...'):
                    # Do not include the terminator itself.
                    break

                # Otherwise, just keep adding the lines to the parseable.
                to_parse.append(line)

            parsed = yaml.load("\n".join(to_parse))

            # Postprocess to make the data usable by Pelican.
            for k in parsed:
                name, value = k.lower(), parsed[k]
                metadata[name] = self.process_metadata(name, value)

            # Return the text entirely.
            content = "\n".join(text)

        else:
            for i, line in enumerate(text):
                kv = line.split(':', 1)
                if len(kv) == 2:
                    name, value = kv[0].lower(), kv[1].strip()
                    metadata[name] = self.process_metadata(name, value)
                else:
                    content = "\n".join(text[i:])
                    break

        return metadata, content

    def read(self, filename):
        with pelican_open(filename) as fp:
            text = list(fp.splitlines())

        metadata, content = self._get_meta_and_content(text)

        extra_args = self.settings.get('PANDOC_ARGS', [])
        extensions = self.settings.get('PANDOC_EXTENSIONS', '')
        if isinstance(extensions, list):
            extensions = ''.join(extensions)

        pandoc_cmd = ["pandoc", "--from=markdown" + extensions, "--to=html5"]
        pandoc_cmd.extend(extra_args)

        proc = subprocess.Popen(pandoc_cmd,
                                stdin=subprocess.PIPE,
                                stdout=subprocess.PIPE)

        output = proc.communicate(content.encode('utf-8'))[0].decode('utf-8')
        status = proc.wait()
        if status:
            raise subprocess.CalledProcessError(status, pandoc_cmd)
        # pandoc will aggressively percent-encode URLs, breaking things.
        # This nasty hack will undo such quoting (in fact too aggressively, if
        # I have percent signs in my content, but I don't
        # so I don't care for now) str.replace might be saner.
        output = urllib.parse.unquote(output)
        return output, metadata


def add_reader(readers):
    for ext in PandocReader.file_extensions:
        readers.reader_classes[ext] = PandocReader


def register():
    signals.readers_init.connect(add_reader)
Usable, albeit slightly hackish, solution for YAML metadata. 2016-08-05 02:52:50 +00:00			`import logging`
Fixes and enhancements including: * Add PANDOC_EXTENSIONS configuration variable, allowing one to enable or disable Pandoc's markdown extensions individually. * Remove dependency on pypandoc. * Don't change the working directory. * More efficient metadata extraction. 2014-11-26 18:42:37 +00:00			`import subprocess`
Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00
added actual code 2014-03-26 10:35:27 +00:00			`from pelican import signals`
			`from pelican.readers import BaseReader`
use pelican_open in order to read the file 2014-03-30 14:02:53 +00:00			`from pelican.utils import pelican_open`
urlparse 2019-02-22 09:13:37 +00:00			`import urllib.parse`

prettify code 2014-08-27 07:41:47 +00:00
Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00			`try:`
			`import yaml`
			`except ImportError:`
			`yaml = None`
Usable, albeit slightly hackish, solution for YAML metadata. 2016-08-05 02:52:50 +00:00			`logging.warning("YAML is not installed; the YAML reader will not work.")`
Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00

prettify code 2014-08-27 07:41:47 +00:00			`class PandocReader(BaseReader):`
added actual code 2014-03-26 10:35:27 +00:00			`enabled = True`
			`file_extensions = ['md', 'markdown', 'mkd', 'mdown']`

Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00			`def _get_meta_and_content(self, text):`
			`metadata = {}`

			`use_YAML = text[0] == '---' and yaml is not None`
			`if use_YAML:`
			`# Load the data we need to parse`
			`to_parse = []`
Usable, albeit slightly hackish, solution for YAML metadata. 2016-08-05 02:52:50 +00:00			`for i, line in enumerate(text[1:]):`
Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00			# When we find a terminator (`---` or `...`), stop.
Usable, albeit slightly hackish, solution for YAML metadata. 2016-08-05 02:52:50 +00:00			`if line in ('---', '...'):`
Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00			`# Do not include the terminator itself.`
			`break`

			`# Otherwise, just keep adding the lines to the parseable.`
			`to_parse.append(line)`

Usable, albeit slightly hackish, solution for YAML metadata. 2016-08-05 02:52:50 +00:00			`parsed = yaml.load("\n".join(to_parse))`
Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00
			`# Postprocess to make the data usable by Pelican.`
			`for k in parsed:`
Usable, albeit slightly hackish, solution for YAML metadata. 2016-08-05 02:52:50 +00:00			`name, value = k.lower(), parsed[k]`
Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00			`metadata[name] = self.process_metadata(name, value)`

Usable, albeit slightly hackish, solution for YAML metadata. 2016-08-05 02:52:50 +00:00			`# Return the text entirely.`
			`content = "\n".join(text)`

Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00			`else:`
			`for i, line in enumerate(text):`
			`kv = line.split(':', 1)`
			`if len(kv) == 2:`
			`name, value = kv[0].lower(), kv[1].strip()`
			`metadata[name] = self.process_metadata(name, value)`
			`else:`
			`content = "\n".join(text[i:])`
			`break`

			`return metadata, content`

added actual code 2014-03-26 10:35:27 +00:00			`def read(self, filename):`
Fixes and enhancements including: * Add PANDOC_EXTENSIONS configuration variable, allowing one to enable or disable Pandoc's markdown extensions individually. * Remove dependency on pypandoc. * Don't change the working directory. * More efficient metadata extraction. 2014-11-26 18:42:37 +00:00			`with pelican_open(filename) as fp:`
			`text = list(fp.splitlines())`

Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00			`metadata, content = self._get_meta_and_content(text)`
added actual code 2014-03-26 10:35:27 +00:00
Fixes and enhancements including: * Add PANDOC_EXTENSIONS configuration variable, allowing one to enable or disable Pandoc's markdown extensions individually. * Remove dependency on pypandoc. * Don't change the working directory. * More efficient metadata extraction. 2014-11-26 18:42:37 +00:00			`extra_args = self.settings.get('PANDOC_ARGS', [])`
			`extensions = self.settings.get('PANDOC_EXTENSIONS', '')`
			`if isinstance(extensions, list):`
			`extensions = ''.join(extensions)`
added actual code 2014-03-26 10:35:27 +00:00
Fixes and enhancements including: * Add PANDOC_EXTENSIONS configuration variable, allowing one to enable or disable Pandoc's markdown extensions individually. * Remove dependency on pypandoc. * Don't change the working directory. * More efficient metadata extraction. 2014-11-26 18:42:37 +00:00			`pandoc_cmd = ["pandoc", "--from=markdown" + extensions, "--to=html5"]`
			`pandoc_cmd.extend(extra_args)`

			`proc = subprocess.Popen(pandoc_cmd,`
Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00			`stdin=subprocess.PIPE,`
			`stdout=subprocess.PIPE)`
added actual code 2014-03-26 10:35:27 +00:00
Fixes and enhancements including: * Add PANDOC_EXTENSIONS configuration variable, allowing one to enable or disable Pandoc's markdown extensions individually. * Remove dependency on pypandoc. * Don't change the working directory. * More efficient metadata extraction. 2014-11-26 18:42:37 +00:00			`output = proc.communicate(content.encode('utf-8'))[0].decode('utf-8')`
			`status = proc.wait()`
			`if status:`
			`raise subprocess.CalledProcessError(status, pandoc_cmd)`
working pandoc URL unquoter 2019-02-22 09:37:30 +00:00			`# pandoc will aggressively percent-encode URLs, breaking things.`
			`# This nasty hack will undo such quoting (in fact too aggressively, if`
			`# I have percent signs in my content, but I don't`
			`# so I don't care for now) str.replace might be saner.`
			`output = urllib.parse.unquote(output)`
			`return output, metadata`
prettify code 2014-08-27 07:41:47 +00:00
Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00
added actual code 2014-03-26 10:35:27 +00:00			`def add_reader(readers):`
pandoc reader for all markdown extensions This makes the PandocReader the default markdown reader for all markdown extensions not just "md" 2014-11-25 16:23:07 +00:00			`for ext in PandocReader.file_extensions:`
			`readers.reader_classes[ext] = PandocReader`
prettify code 2014-08-27 07:41:47 +00:00
Add support for parsing YAML metadata. 2015-05-16 18:13:47 +00:00
added actual code 2014-03-26 10:35:27 +00:00			`def register():`
			`signals.readers_init.connect(add_reader)`