Skip to content

Textify (OCR wrapper)

Turn the provided Python wrapper around ocrmypdf into a convenient terminal command named textify that can:

  • OCR PDFs (single file or whole directory)
  • Optionally combine many PDFs into one output
  • Optionally recompress PDFs to reduce file size
  • Optionally skip OCR (copy-only) via --no-ocr (useful if you just want combine/compress)

Note

This page uses Material for MkDocs markdown patterns. :contentReference[oaicite:0]{index=0}

What you’ll get

After installation you’ll be able to run:

  • Directory mode (OCR everything into a new folder)

    textify -d ~/Scans ~/Scans_text
    

  • Directory mode + combine result into one PDF

    textify -d -c ~/Scans ~/Scans_text
    

  • Directory mode + combine + compress (with a level)

    textify -d -c -z --compress-level ebook ~/Scans ~/Scans_text
    

  • Single file mode (optionally compress)

    textify -z ./input.pdf ./output.pdf
    


Prerequisites

Required

  • Python 3 (the script uses #!/usr/bin/env python3)
  • ocrmypdf (required unless you run with --no-ocr)
  • A working shell environment with the command available on your PATH

Optional (enables extra features)

  • Combining many PDFs → one:
  • qpdf (preferred), or
  • pdfunite (from Poppler), or
  • gs (Ghostscript)

  • Compressing PDFs:

  • gs (preferred for --compress-level), or
  • qpdf (fallback compressor; compression levels are ignored)

Language packs

OCR is done via Tesseract through ocrmypdf. The default language is eng. If you use --lang for other languages (e.g. isl+eng), you may need to install the corresponding Tesseract language data on your system.


Install dependencies

# Required
brew install ocrmypdf

# Optional: any of the following will enable extra features
brew install qpdf             # best for combining, fallback compressor
brew install poppler          # gives `pdfunite` as combine fallback
brew install ghostscript      # combine (fallback) + best compression levels
# Required (package names can vary by distro version)
sudo apt update
sudo apt install -y ocrmypdf

# Optional
sudo apt install -y qpdf poppler-utils ghostscript
  • Install ocrmypdf using your package manager.
  • Install at least one combine tool (qpdf, pdfunite, or gs) if you want -c/--combine.
  • Install ghostscript or qpdf if you want -z/--compress.

What happens when tools are missing?

  • If you run OCR (no --no-ocr) and ocrmypdf is missing, textify exits with 127.
  • If you request combine and no combiner is available, the combine step fails with a clear error (and exits 127).
  • If you request compress and no compressor is available, the compression step fails with a clear error (and exits 127).

1) Create the script

Pick a location you control, e.g. ~/bin (create it if it doesn’t exist):

mkdir -p ~/bin
vim ~/bin/textify

Paste the script below and save the file as textify.

Show script (Python) — save as ~/bin/textify
#!/usr/bin/env python3
"""
textify — thin wrapper around ocrmypdf with optional PDF combine and compression

Usage:
Directory mode:
    ./textify -d [--no-ocr] [-c|--combine] [-z|--compress [--compress-level LEVEL]] [--lang LANG] [-V|--ocr-verbose] SRC_DIR DST_DIR

Single file mode:
    ./textify [--no-ocr] [-z|--compress [--compress-level LEVEL]] [--lang LANG] [-V|--ocr-verbose] SRC_PDF DST_PDF
"""

import argparse
import os
import shutil
import sys
from pathlib import Path
from subprocess import run, PIPE, STDOUT

# ── Neon ANSI styling ──────────────────────────────────────────────────────────
USE_COLOR = sys.stdout.isatty() and os.environ.get("NO_COLOR") is None


def _paint(code: str, text: str) -> str:
    return f"\033[{code}m{text}\033[0m" if USE_COLOR else text


NEON = {
    "pink": "1;38;5;199",
    "cyan": "1;38;5;51",
    "purple": "1;38;5;135",
    "green": "1;38;5;82",
    "yellow": "1;38;5;226",
    "red": "1;38;5;197",
    "blue": "1;38;5;45",
    "orange": "1;38;5;208",  # iteration counters
    "dim": "2;38;5;244",
}


def neon(text: str, color: str) -> str:
    return _paint(NEON[color], text)


def tag(label: str, color: str) -> str:
    return neon(f"[{label}]", color)


def arrow() -> str:
    return neon(" → ", "blue")


def bullet() -> str:
    return neon("◆", "purple")


# ── Pink vertical gutter bar ───────────────────────────────────────────────────
def pick_bar_char() -> str:
    try:
        "┃".encode(sys.stdout.encoding or "utf-8")
        return "┃"
    except Exception:
        return "|"


VBAR_RAW = pick_bar_char()
VBAR = neon(VBAR_RAW, "pink")


def gutter(line: str = "", spaces: int = 2) -> str:
    return f"{VBAR}{' ' * spaces}{line}"


def group_header(left: str) -> str:
    return neon(left, "pink")


def group_footer() -> str:
    return neon("┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━", "pink")


# ── OCR config ─────────────────────────────────────────────────────────────────
OCR_BASE = ["ocrmypdf", "--force-ocr", "--optimize", "0", "--deskew"]


def build_ocr_cmd(ocr_lang: str) -> list[str]:
    cmd = OCR_BASE.copy()
    if ocr_lang:
        cmd += ["-l", ocr_lang]
    return cmd


def which(cmd: str) -> Path | None:
    p = shutil.which(cmd)
    return Path(p) if p else None


def ensure_ocrmypdf():
    if which("ocrmypdf") is None:
        print(neon("error:", "red"), "'ocrmypdf' not found in PATH. Install it via Homebrew.", file=sys.stderr)
        sys.exit(127)


def is_pdf(path: Path) -> bool:
    return path.is_file() and path.suffix.lower() == ".pdf"


def run_ocr_quiet(src: Path, dst: Path, ocr_lang: str) -> tuple[int, list[str] | None]:
    """Run ocrmypdf, capturing output. Return (rc, tail_lines_on_error_or_None)."""
    cmd = build_ocr_cmd(ocr_lang) + [str(src), str(dst)]
    result = run(cmd, stdout=PIPE, stderr=STDOUT, check=False)
    if result.returncode != 0 and result.stdout is not None:
        lines = result.stdout.decode(errors="ignore").splitlines()
        tail = lines[-20:] if lines else []
        return result.returncode, tail
    return result.returncode, None


def run_ocr_verbose(src: Path, dst: Path, ocr_lang: str) -> int:
    """Run ocrmypdf with passthrough output."""
    cmd = build_ocr_cmd(ocr_lang) + [str(src), str(dst)]
    return run(cmd, check=False).returncode


def convert_one(
    src: Path, dst: Path, ocr_lang: str, passthrough_logs: bool, skip_ocr: bool
) -> tuple[int, list[str] | None]:
    if not src.exists():
        return 1, [f"source does not exist: {src}"]
    if not is_pdf(src):
        return 1, [f"source is not a .pdf: {src}"]
    if dst.exists():
        return 1, [f"destination already exists (won't overwrite): {dst}"]

    dst.parent.mkdir(parents=True, exist_ok=True)

    # ── SKIP OCR / COPY MODE ──
    if skip_ocr:
        try:
            shutil.copy2(src, dst)
            return 0, None
        except Exception as e:
            return 1, [f"copy failed: {e}"]

    # ── NORMAL OCR MODE ──
    if passthrough_logs:
        rc = run_ocr_verbose(src, dst, ocr_lang)
        return rc, None
    else:
        return run_ocr_quiet(src, dst, ocr_lang)


# ── Combine PDFs ───────────────────────────────────────────────────────────────
def find_unique_combined_path(out_dir: Path) -> Path:
    base = out_dir / "textified-combined.pdf"
    if not base.exists():
        return base
    i = 1
    while True:
        cand = out_dir / f"textified-combined{i}.pdf"
        if not cand.exists():
            return cand
        i += 1


def run_combine_tool(inputs: list[Path], out_file: Path) -> tuple[int, str]:
    """Core logic to run the best available PDF combine tool."""
    inputs_str = [str(p) for p in inputs]

    if which("qpdf"):
        cmd = ["qpdf", "--empty", "--pages", *inputs_str, "--", str(out_file)]
        return run(cmd, check=False).returncode, "qpdf"

    elif which("pdfunite"):
        cmd = ["pdfunite", *inputs_str, str(out_file)]
        return run(cmd, check=False).returncode, "pdfunite"

    elif which("gs"):
        cmd = ["gs", "-dBATCH", "-dNOPAUSE", "-q", "-sDEVICE=pdfwrite", f"-sOutputFile={out_file}", *inputs_str]
        return run(cmd, check=False).returncode, "ghostscript"

    else:
        return 127, "none"


def combine_files_visually(inputs: list[Path], out_file: Path) -> int:
    """Runs the combine tool with UI feedback."""
    if not inputs:
        print(neon("note:", "yellow"), "nothing to combine.", file=sys.stderr)
        return 0

    rc, tool = run_combine_tool(inputs, out_file)

    if tool == "none":
        print(
            neon("error:", "red")
            + " no PDF combiner found (tried qpdf, pdfunite, gs). Install one via:\n"
            + "  brew install qpdf   # recommended\n",
            file=sys.stderr,
        )
        return 127

    print(group_header(f"┏━ COMBINE {neon(tool, 'purple')}{arrow()}{neon(out_file.name, 'green')}"))
    if rc == 0:
        print(gutter(f"  {tag('OK', 'green')} combined {len(inputs)} files into {out_file.name}", spaces=1))
    else:
        print(gutter(f"  {tag('FAIL', 'red')} combining into {out_file} {neon(f'(rc={rc})', 'dim')}", spaces=1))
    print(group_footer(), end="\n\n")
    return rc


# ── Compress PDFs ──────────────────────────────────────────────────────────────
def choose_compressor(level: str):
    """Pick a compressor once, return (tool_label, build_cmd(in_path, out_path))."""
    if which("gs"):
        tool = f"ghostscript/{level}"

        def build_cmd(inp: Path, outp: Path):
            return [
                "gs",
                "-dBATCH",
                "-dNOPAUSE",
                "-dQUIET",
                "-sDEVICE=pdfwrite",
                "-dCompatibilityLevel=1.6",
                f"-dPDFSETTINGS=/{level}",
                f"-sOutputFile={outp}",
                str(inp),
            ]

        return tool, build_cmd
    if which("qpdf"):
        tool = "qpdf"

        def build_cmd(inp: Path, outp: Path):
            return ["qpdf", "--object-streams=generate", "--compress-streams=y", str(inp), str(outp)]

        return tool, build_cmd
    return None, None


def compress_many(paths: list[Path], level: str) -> int:
    paths = [p for p in paths if p and p.exists()]
    if not paths:
        return 0

    tool, build_cmd = choose_compressor(level)
    if tool is None:
        print(neon("error:", "red") + " cannot compress — install ghostscript or qpdf.", file=sys.stderr)
        return 127

    print(group_header(f"\n┏━ COMPRESS {neon(tool, 'purple')} {neon('━━━━━━━━━━━━━━━━━━━━━━━━━━\n┃', 'pink')}"))

    failures = 0
    total = len(paths)
    for idx, p in enumerate(paths, 1):
        iter_tag = neon(f"[{idx}/{total}]", "orange")
        print(gutter(f"{bullet()} {iter_tag} compressing {neon(p.name, 'cyan')}", spaces=2))

        tmp_out = p.with_name(p.stem + ".tmp.pdf")
        cmd = build_cmd(p, tmp_out)
        rc = run(cmd, check=False).returncode

        if rc == 0 and tmp_out.exists():
            try:
                os.replace(tmp_out, p)
                print(gutter(f"  {tag('OK', 'green')} compressed {p.name}", spaces=2))
            except OSError as e:
                print(gutter(f"  {tag('FAIL', 'red')} could not replace original for {p}: {e}", spaces=2))
                Path(tmp_out).unlink(missing_ok=True)
                failures += 1
        else:
            print(gutter(f"  {tag('FAIL', 'red')} compressing {p} {neon(f'(rc={rc})', 'dim')}", spaces=2))
            Path(tmp_out).unlink(missing_ok=True)
            failures += 1

        print(gutter("", spaces=2))

    print(group_footer(), end="\n\n")

    if failures:
        print(tag("SUMMARY", "yellow"), neon(f"compression: {total - failures} ok, {failures} failed", "yellow"), end="\n\n")
        return 1
    else:
        print(tag("SUMMARY", "green"), neon(f"compression: {total} ok", "green"), end="\n\n")
        return 0


# ── Directory & File modes ─────────────────────────────────────────────────────
def handle_directory_mode(
    src_dir: Path,
    dst_dir: Path,
    combine: bool,
    do_compress: bool,
    compress_level: str,
    ocr_lang: str,
    passthrough_logs: bool,
    skip_ocr: bool,
) -> int:
    if not src_dir.exists() or not src_dir.is_dir():
        print(neon("error:", "red"), f"not a directory: {src_dir}", file=sys.stderr)
        return 1
    if src_dir.resolve() == dst_dir.resolve():
        print(neon("error:", "red"), "output directory must be different from input directory.", file=sys.stderr)
        return 1

    dst_dir.mkdir(parents=True, exist_ok=True)
    pdfs = sorted([p for p in src_dir.iterdir() if p.is_file() and p.suffix.lower() == ".pdf"])
    if not pdfs:
        print(neon("note:", "yellow"), f"no .pdf files found in {src_dir}")
        return 0

    # ──────────────────────────────────────────────────────────────────────────
    # MODE A: COMBINE IS ON (Combine Source -> OCR Combined -> Compress Combined)
    # ──────────────────────────────────────────────────────────────────────────
    if combine:
        # 1. Combine Raw Sources to a temp file
        temp_merged = dst_dir / ".textify_temp_merged_source.pdf"

        rc_comb = combine_files_visually(pdfs, temp_merged)
        if rc_comb != 0 or not temp_merged.exists():
            return 1

        # 2. Convert (OCR or Copy) the single merged file
        final_dst = find_unique_combined_path(dst_dir)

        if skip_ocr:
            header_title = f"┏━ COPY {neon('(NO OCR)', 'dim')} {neon('━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n┃', 'pink')}"
            action_verb = "copying"
        else:
            lang_hint = f" -l {ocr_lang}" if ocr_lang else ""
            header_title = f"┏━ OCR {neon('ocrmypdf', 'purple')}{neon(lang_hint, 'dim')} {neon('━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n┃', 'pink')}"
            action_verb = "converting"

        print(group_header(header_title))

        # Mimic iteration UI for the single big file
        print(gutter(f"{bullet()} {neon('[1/1]', 'orange')} {action_verb} {neon('merged document', 'cyan')}", spaces=2))

        rc_ocr, tail = convert_one(temp_merged, final_dst, ocr_lang, passthrough_logs, skip_ocr)

        # Clean up temp raw file immediately
        try:
            temp_merged.unlink()
        except OSError:
            pass

        if rc_ocr == 0:
            print(gutter(f"  {tag('OK', 'green')} {temp_merged.name}{arrow()}{final_dst.name}", spaces=2))
        else:
            print(gutter(f"  {tag('FAIL', 'red')} {temp_merged.name}{arrow()}{final_dst.name} {neon(f'(rc={rc_ocr})', 'dim')}", spaces=2))
            if tail:
                print(gutter(neon("── ocrmypdf (last 20 lines) ─────────────────────────────", "dim"), spaces=2))
                for t in tail:
                    print(gutter(neon(t, "dim"), spaces=2))

        print(group_footer(), end="\n\n")

        if rc_ocr != 0:
            return 1

        # 3. Compress the single result
        if do_compress:
            return compress_many([final_dst], compress_level)

        return 0

    # ──────────────────────────────────────────────────────────────────────────
    # MODE B: COMBINE IS OFF (Process Individually)
    # ──────────────────────────────────────────────────────────────────────────
    else:
        if skip_ocr:
            header_title = f"┏━ COPY {neon('(NO OCR)', 'dim')} {neon('━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n┃', 'pink')}"
            action_verb = "copying"
        else:
            lang_hint = f" -l {ocr_lang}" if ocr_lang else ""
            header_title = f"┏━ OCR {neon('ocrmypdf', 'purple')}{neon(lang_hint, 'dim')} {neon('━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n┃', 'pink')}"
            action_verb = "converting"

        print(group_header(header_title))
        failures = 0
        produced: list[Path] = []
        total = len(pdfs)

        for idx, src in enumerate(pdfs, 1):
            iter_tag = neon(f"[{idx}/{total}]", "orange")
            line_left = f"{bullet()} {iter_tag} {action_verb} {neon(src.name, 'cyan')}"
            print(gutter(line_left, spaces=2))

            dst = dst_dir / src.name
            rc, tail = convert_one(src, dst, ocr_lang, passthrough_logs, skip_ocr)
            if rc == 0:
                print(gutter(f"  {tag('OK', 'green')} {src.name}{arrow()}{dst.name}", spaces=2))
                produced.append(dst)
            else:
                print(gutter(f"  {tag('FAIL', 'red')} {src.name}{arrow()}{dst.name} {neon(f'(rc={rc})', 'dim')}", spaces=2))
                if tail:
                    print(gutter(neon("── ocrmypdf (last 20 lines) ─────────────────────────────", "dim"), spaces=2))
                    for t in tail:
                        print(gutter(neon(t, "dim"), spaces=2))
                failures += 1
            print(gutter("", spaces=2))
        print(group_footer(), end="\n\n")

        print(tag("SUMMARY", "purple"), neon(f"{len(produced)} succeeded, {failures} failed.", "yellow"), end="\n\n")

        if do_compress and produced:
            return compress_many(produced, compress_level)

        return 0 if failures == 0 else 1


def handle_file_mode(
    src: Path,
    dst: Path,
    combine: bool,
    do_compress: bool,
    compress_level: str,
    ocr_lang: str,
    passthrough_logs: bool,
    skip_ocr: bool,
) -> int:
    if combine:
        print(neon("note:", "yellow"), "--combine is intended for -d (directory) mode; ignoring in single-file mode.")

    # Single-file: mimic group look with a mini OCR group
    if skip_ocr:
        header_title = f"┏━ COPY {neon('(NO OCR)', 'dim')} {neon('━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━', 'pink')}"
        action_verb = "copying"
    else:
        lang_hint = f" -l {ocr_lang}" if ocr_lang else ""
        header_title = f"\n┏━ OCR {neon('ocrmypdf', 'purple')}{neon(lang_hint, 'dim')} {neon('━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━', 'pink')}"
        action_verb = "converting"

    print(group_header(header_title))
    iter_tag = neon("[1/1]", "orange")
    print(gutter(f"{bullet()} {iter_tag} {action_verb} {neon(Path(src).name, 'cyan')}", spaces=2))

    rc, tail = convert_one(src, dst, ocr_lang, passthrough_logs, skip_ocr)

    if rc == 0:
        print(gutter(f"  {tag('OK', 'green')} {src.name}{arrow()}{dst.name}", spaces=2))
    else:
        print(gutter(f"  {tag('FAIL', 'red')} {src.name}{arrow()}{dst.name} {neon(f'(rc={rc})', 'dim')}", spaces=2))
        if tail:
            print(gutter(neon("── ocrmypdf (last 20 lines) ─────────────────────────────", "dim"), spaces=2))
            for t in tail:
                print(gutter(neon(t, "dim"), spaces=2))
    print(group_footer(), end="\n\n")

    if rc == 0 and do_compress:
        _rc = compress_many([dst], compress_level)
        return 0 if _rc == 0 else 1
    return rc


# ── CLI ────────────────────────────────────────────────────────────────────────
def parse_args():
    p = argparse.ArgumentParser(
        prog="textify", description="Convert PDFs to OCR'd PDFs using ocrmypdf, with optional combine & compression."
    )
    p.add_argument(
        "-d", "--directory", action="store_true", help="Directory mode: treat the two paths as <src_dir> <dst_dir>."
    )
    p.add_argument(
        "--no-ocr", action="store_true", help="Skip the OCR step (copy only). Useful if just compressing/combining."
    )
    p.add_argument(
        "-c",
        "--combine",
        action="store_true",
        help="Combine sources FIRST, then OCR/Compress only the single combined file (only in -d mode).",
    )
    p.add_argument(
        "-z",
        "--compress",
        action="store_true",
        help="After conversion, recompress PDFs (in place) using Ghostscript/qpdf.",
    )
    p.add_argument(
        "--compress-level",
        choices=["screen", "ebook", "printer", "prepress", "default"],
        default="ebook",
        help="Compression level for Ghostscript (ignored if only qpdf is available). Default: ebook.",
    )
    p.add_argument(
        "--lang",
        "--ocr-lang",
        dest="ocr_lang",
        default="eng",
        help="OCR language(s) for Tesseract via ocrmypdf (-l). Example: --lang isl+eng",
    )
    p.add_argument(
        "-V", "--ocr-verbose", action="store_true", help="Show raw ocrmypdf output (passthrough). Default is quiet."
    )
    p.add_argument("src", help="Source path (pdf or directory).")
    p.add_argument("dst", help="Destination path (new pdf or output directory).")
    return p.parse_args()


def main():
    args = parse_args()

    if not args.no_ocr:
        ensure_ocrmypdf()

    src = Path(args.src)
    dst = Path(args.dst)

    if args.directory:
        code = handle_directory_mode(
            src, dst, args.combine, args.compress, args.compress_level, args.ocr_lang, args.ocr_verbose, args.no_ocr
        )
    else:
        code = handle_file_mode(
            src, dst, args.combine, args.compress, args.compress_level, args.ocr_lang, args.ocr_verbose, args.no_ocr
        )
    sys.exit(code)


if __name__ == "__main__":
    main()

2) Make it executable

chmod u+x ~/bin/textify

3) Make it accessible everywhere

Choose one of the options below.

Zsh or Bash

# Add once, then reload your shell
echo 'export PATH="$HOME/bin:$PATH"' >> ~/.zshrc   # if you use zsh
echo 'export PATH="$HOME/bin:$PATH"' >> ~/.bashrc  # if you use bash

# Reload (pick the file you actually updated)
source ~/.zshrc  || source ~/.bashrc

# Test
command -v textify

Fish

set -U fish_user_paths $HOME/bin $fish_user_paths
command -v textify
sudo ln -s ~/bin/textify /usr/local/bin/textify
command -v textify

Zsh or Bash

echo 'alias textify="$HOME/bin/textify"' >> ~/.zshrc   # or ~/.bashrc
source ~/.zshrc || source ~/.bashrc
textify -h

Fish

alias textify $HOME/bin/textify
funcsave textify
textify -h

Warning

If you use an alias, it exists only in interactive shells. Prefer PATH or a symlink for scripts and non-interactive contexts.


4) Verify the installation

Run the command to see the help:

textify -h

Note

Help text can vary slightly by platform and argparse version.


Usage

Directory mode

Convert every *.pdf from a source folder into a destination folder (same filenames):

textify -d ~/Scans ~/Scans_text

Directory mode + combine

Combine the source PDFs first, then OCR (or copy) only the single combined document.

textify -d -c ~/Scans ~/Scans_text

Combine behavior

  • The script prefers qpdf, then pdfunite, then ghostscript.
  • The combined output is written to the destination directory as:
  • textified-combined.pdf
  • textified-combined1.pdf, textified-combined2.pdf, ... if the name already exists.
  • A temporary merge file is created in the destination folder during combine (and cleaned up afterward).

Directory mode + combine + compress

textify -d -c -z --compress-level ebook ~/Scans ~/Scans_text

Single file mode

Create an OCR’d PDF at a new path (will not overwrite existing files):

textify ./input.pdf ./output.pdf

Optionally compress the result:

textify -z ./input.pdf ./output.pdf

Note

--combine is intended for -d/--directory mode. In single-file mode, --combine is ignored with a helpful note.


Verbose vs. quiet OCR logs

Default runs are quiet and summarize success/failure. To see raw ocrmypdf logs as they happen:

textify -V ./input.pdf ./output.pdf

Tip

Output uses ANSI colors when writing to a TTY. Set NO_COLOR=1 to disable colorization:

NO_COLOR=1 textify -d ~/Scans ~/Scans_text


Copy-only mode (skip OCR)

Use --no-ocr to skip OCR entirely (it will copy PDFs instead). This is useful when you only want to:

  • Combine a directory into a single PDF, or
  • Compress PDFs, without OCR
textify -d --no-ocr ~/Scans ~/Scans_text
textify -d --no-ocr -c ~/Scans ~/Scans_text
textify -d --no-ocr -z --compress-level ebook ~/Scans ~/Scans_text

Note

When --no-ocr is set, ocrmypdf is not required and is not checked.


CLI options

Option Meaning
-d, --directory Directory mode: treat the two paths as <src_dir> <dst_dir>
--no-ocr Skip OCR step (copy only)
-c, --combine Directory mode only: combine sources first, then OCR/copy the single combined file
-z, --compress Compress output PDFs in-place after creation
--compress-level {screen,ebook,printer,prepress,default} Ghostscript profile (ignored if only qpdf is available). Default: ebook
--lang, --ocr-lang OCR language(s) for Tesseract via ocrmypdf -l (example: isl+eng)
-V, --ocr-verbose Show raw ocrmypdf output (passthrough)

Compression levels

When Ghostscript is available, you can pick a target quality via --compress-level. If only qpdf is available, a reasonable stream compression is applied (levels are ignored).

Level Typical use
screen Smallest files, lower quality
ebook Balanced (default)
printer Higher quality
prepress Highest quality
default Ghostscript default profile

Examples

textify -d -z --compress-level screen  ~/Scans  ~/Scans_text   # smallest
textify -d -z --compress-level printer ~/Scans  ~/Scans_text   # higher quality

Behavior & conventions

  • Destination safety: Single-file mode refuses to overwrite an existing file.
  • Input filtering: Directory mode processes only files with .pdf suffix (case-insensitive).
  • Combine output name: textified-combined.pdf, then textified-combined1.pdf, textified-combined2.pdf, … if needed.
  • Return codes:
  • 0 — all requested steps completed successfully
  • 1 — at least one file failed, or a requested step failed
  • 127 — a required tool is missing (e.g., ocrmypdf); combine/compress tools may also trigger 127 when absent
Why so many combine tools?

The script prefers qpdf, then falls back to pdfunite (Poppler), then gs (Ghostscript).
Install at least one; qpdf yields the most consistent results for combining.


Troubleshooting

'ocrmypdf' not found in PATH

  • Install it (see prerequisites) and ensure your shell PATH includes your package manager’s bin directory.
  • If you only want combine/compress, you can use --no-ocr.

Combine says it can’t find a tool

  • Install one of: qpdf, poppler (for pdfunite), or ghostscript.
  • Re-run textify -d -c ....

Compress says it can’t compress

  • Install ghostscript (best) or qpdf.
  • If you only have qpdf, --compress-level is ignored.

No color / weird characters

  • Colors are enabled only if writing to a TTY. Set NO_COLOR=1 to disable.
  • If your terminal can’t render box-drawing characters, the script automatically falls back to ASCII.

Uninstall

rm -f ~/bin/textify
sudo rm -f /usr/local/bin/textify
  • Remove the export PATH="$HOME/bin:$PATH" line from your shell config (~/.zshrc, ~/.bashrc, etc).
  • Restart your shell.