Blog on Posit Open Source

Introducing Great Docs: Beautiful Documentation for Python Packages

Rich Iannone — Wed, 15 Apr 2026 00:00:00 +0000

When someone discovers your Python package, the first thing they see is the documentation site. That site should look good, feel cohesive, and reflect the identity of your project.

While building documentation sites for projects like Great Tables and Pointblank , we learned just how much effort goes into making a site that looks distinctive and matches the character of each project: custom themes, tailored layouts, interactive features, and countless small design decisions. That experience taught me what a great documentation site needs, and I wanted to distill all of those learnings into a tool that gives every Python package a polished site from the start. That led me to build Great Docs , a documentation generator that produces beautiful sites out of the box (but with simple options to customize the look and make it yours).

Great Docs is now part of the Posit open-source ecosystem, available on PyPI , and at v0.7 with seven releases since its initial soft launch.

What Is Great Docs?

Great Docs is a documentation site generator for Python packages. You point it at your project and it produces a documentation site with API reference, CLI reference, user guides, changelog, and landing page. It auto-discovers your package’s public API, generates Quarto -based pages, and renders the result into a static site.

The entire workflow involves just a few commands:

1
2
3


great-docs init       # one-time: auto-detect your package, write config
great-docs build      # build (or rebuild) the site
great-docs preview    # open it in your browser

There is no boilerplate to write, no templates to configure, and no theme to choose as the defaults produce a good-looking site on their own. If you want to go further, the great-docs.yml configuration file offers extensive customization: navbar gradients, content styles, announcement banners, author metadata, custom sections, and much more.

Why Another Documentation Generator?

There is no shortage of documentation tooling in the Python ecosystem, but after years of building and maintaining documentation sites for my own packages, a few pain points kept surfacing:

Discovery is manual. Most tools require you to explicitly list every class, function, and method you want documented, which becomes tedious and error-prone as your package grows.

The output looks dated. Many popular generators produce sites that feel like they belong to an earlier era of the web. Mobile responsiveness, dark mode, and modern typography are afterthoughts, if they are supported at all.

LLMs cannot easily consume the output. Developers routinely paste documentation into AI assistants, and if your docs are not structured for machine consumption, the answers those assistants produce will be worse.

Deployment is a separate project. Getting from a built site to a live GitHub Pages deployment often involves manually writing CI workflows.

No quality tools. Most generators stop at rendering. If you want to check for broken links, catch spelling and grammar issues, or understand how your API has changed across versions, you are on your own with third-party tools and custom scripts.

Great Docs addresses all of these, and the rest of this post walks through the key features.

Auto-Discovery: Your API, Documented Automatically

When you run great-docs init, the tool inspects your package and discovers its public API automatically. It finds classes, functions, dataclasses, protocols, enumerations, exceptions, type aliases, and more, using a combination of runtime introspection and static analysis via griffe . It detects your docstring style (NumPy, Google, or Sphinx) and writes a great-docs.yml configuration file that captures the full structure of your API.

The result is 13 distinct object-type categories: classes with many methods get their own dedicated sections, overloaded signatures are displayed correctly, and long signatures are formatted across multiple lines for readability (all without you having to enumerate a single export).

If you later add new functions or classes to your package, just run great-docs init again (or great-docs build with the --scan option) and the configuration updates to reflect your current API.

Powered by Quarto and quartodoc

Great Docs is an evolution of quartodoc , which pioneered the idea of generating Quarto-based API reference pages for Python packages. All of the excellent work quartodoc does for API documentation (docstring rendering, cross-references between symbols, etc.) carries forward into Great Docs. quartodoc will continue to be maintained as a standalone tool for projects that need focused API reference generation, while Great Docs builds on that foundation to add site-wide features like styling, interactive widgets, CLI reference, quality tools, and the LLM-friendly output described below.

Under the hood, Great Docs generates Quarto .qmd files and renders them into a static site. Quarto is a scientific and technical publishing system that supports executable code cells, rich cross-references, callout blocks, tabsets, and many other features useful for documentation, and by building on it Great Docs inherits all of these capabilities. User guides can include live code examples with rendered output, API reference pages get syntax highlighting via Pygments, and the entire site benefits from Quarto’s rendering pipeline.

Styling and Interactive Features

Great Docs ships with a set of styles and interactive features that cover most of what you would want from a documentation site:

Dark mode toggle with system-preference detection and persistence across sessions
Animated navbar gradients with eight preset themes (sky, peach, prism, lilac, mint, ocean, sunset, forest)
Subtle content gradients that add visual warmth without distraction
GitHub widget showing live star and fork counts
Sidebar search for quickly filtering long API reference lists
Smart sidebar wrapping that handles long class and method names gracefully
Responsive design that works well on phones and tablets
Copy Page widget for one-click copy-as-Markdown on reference pages
Back-to-top button that appears after scrolling, with smooth animation and dark mode support
Keyboard navigation with shortcuts for search (s), page browsing ([/]), dark mode (d), and a help overlay (?)
Social cards with automatic Open Graph and Twitter Card meta tags for rich link previews
Page tags for categorizing pages via YAML frontmatter, rendered as pill-shaped links above the title with an auto-generated tags index page
Page status badges marking pages as new, beta, deprecated, or experimental, with color-coded badges below the title and compact icons in the sidebar
Inline icons via the {{< icon name >}} shortcode, giving access to 1,900+ Lucide icons that scale with text and inherit color
Navigation icons via Lucide for navbar and sidebar labels
Internationalization with support for 23 languages via a single site.language config option
JS Tooltips replacing native browser tooltips with styled, theme-aware popovers
License feature badges showing permissions, conditions, and limitations as color-coded badge groups

Dark mode and sidebar filtering are expected in any modern documentation site, keyboard shortcuts let you navigate without reaching for the mouse, and the navbar gradients give each project a distinct visual identity without requiring any design work from the package author.

LLM-Friendly by Default

Great Docs automatically generates llms.txt and llms-full.txt files alongside your documentation site: structured, plain-text representations of your entire documentation designed for consumption by large language models.

Every reference page also gets a parallel Markdown version, and a “Copy Page” widget lets users (or AI assistants) grab the content of any page in Markdown format with a single click. A “View as Markdown” option renders the plain-text version directly in the browser.

Great Docs also supports the Agent Skills specification. If your project contains a SKILL.md file, Great Docs automatically discovers it, publishes the skill at /.well-known/skills/, and serves it for agent discovery without any configuration. If you do not have a hand-written skill, Great Docs generates one for you. Either way, coding agents like Claude Code, GitHub Copilot, Cursor, and Codex can find and install your package’s skill with a single command:

1

npx skills add https://your-org.github.io/your-package/

You can also author a comprehensive, hand-written SKILL.md in your project and Great Docs will automatically discover and publish it. A detailed skill can include core concepts, step-by-step workflows, common error patterns with fixes, reference files for configuration and CLI commands, setup and build scripts, and a description of what the agent can and cannot do autonomously. Great Docs publishes the entire skill directory structure and renders it on a dedicated Skills page in your documentation site, with a visual layout of the skill’s file tree, expandable sections for each companion file (references, scripts, assets), and install instructions for every major agent platform.

The practical effect is that when a developer asks an AI assistant “how do I use the build() method?”, the quality of the answer depends on the quality of the documentation the model has access to. And when a coding agent needs to configure, build, or troubleshoot your package’s docs, a structured skill file gives it enough context to act autonomously. These features mean the documentation works for human readers, LLM chat contexts, and agentic workflows.

CLI Documentation

If your package has a Click -based command-line interface, Great Docs can generate reference pages for every command automatically. It discovers your CLI commands, renders each one with its arguments, options, and help text, and adds them to the site with a dedicated sidebar. A reference switcher dropdown lets users toggle between the API Reference and CLI Reference views.

All you have to do is specify the CLI module in your configuration:

1
2
3
4


cli:
  enabled: true
  module: my_package.cli
  name: cli

Click is the starting point, but the roadmap includes support for Typer, argparse, and Fire, with a plugin interface for additional frameworks. The plan is also to move beyond raw --help output toward richer, API-reference-style rendering: styled parameter tables with type annotations and defaults, auto-generated usage examples, cross-references between subcommands and parent groups, environment variable documentation, and full search integration so CLI commands are indexed alongside API symbols.

User Guides, Custom Sections, and Blogs

API reference alone is rarely sufficient. Developers need narrative documentation that explains concepts, walks through workflows, and provides context that docstrings cannot.

Great Docs supports a user_guide/ directory where you place .qmd or .md files. You can also add hand-written HTML pages that are auto-discovered and integrated into the site with minimal transformation, which is useful for product landing pages, interactive demos, or any content that does not fit the .qmd workflow. These are automatically discovered and added to the site with their own navigation section. You can use numeric prefixes (e.g., 01-installation.qmd, 02-quickstart.qmd) to control ordering in the directory; Great Docs strips the prefixes from the generated URLs, so readers see clean paths like /user_guide/installation.html. Subdirectories are supported for hierarchical organization.

Beyond user guides, you can define arbitrary custom sections for recipes, tutorials, examples, or any other content grouping. And if you want a blog, Great Docs supports blog-type sections with chronological listings and Quarto’s blog features built in.

1
2
3
4


sections:
  - title: Recipes
    dir: recipes
    navbar_after: User Guide

Source Code Links

Every documented class, method, and function gets an automatic link back to its source code on GitHub, with precise line-number ranges. This is powered by griffe’s static analysis, so the links point to exactly the right lines, not just the file. The placement is configurable (in the usage section or next to the title), and you can specify the branch or tag to link against.

One-Command Deployment

When your documentation is ready, deploying to GitHub Pages is a single command:

1

great-docs setup-github-pages

This generates a GitHub Actions workflow file that builds your documentation and deploys it automatically on every push. The workflow is configurable (branch, Python version, docs directory), and once it is in place, your documentation stays up to date with every commit.

Quality Tools Built In

Great Docs includes tooling for keeping your documentation healthy:

great-docs check audits your configuration against your actual exports, showing what is documented and what is missing
great-docs check-links finds broken links across your documentation, with configurable timeouts and ignore patterns
great-docs lint analyzes your public API for missing docstrings, broken cross-references, style mismatches, and unknown directives
great-docs proofread runs local grammar and spelling checks powered by Harper , with a built-in technical dictionary and multiple output formats

All of these produce machine-readable JSON output, making them suitable for CI integration.

Configuration

The great-docs.yml configuration file is the single source of truth for your documentation site. The defaults are sensible enough that most projects need very little customization, but when you do want to adjust things the options are extensive: display names, author metadata with ORCID identifiers, funding and copyright information, announcement banners, homepage modes, theme settings, sidebar behavior, and more.

Here is what a configuration might look like for a well-established project:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


display_name: My Package

announcement:
  content: "Version 2.0 is here! Check out the changelog."
  type: info
  style: lilac
  dismissable: true

navbar_style: sky
content_style: lilac

authors:
  - name: Mara Rosario
    role: Maintainer
    github: mbrosario
    orcid: 0000-0002-8471-3056

sections:
  - title: Tutorials
    dir: tutorials
    navbar_after: User Guide

cli:
  enabled: true
  module: my_package.cli
  name: cli

reference:
  sections:
    - title: Core
      contents: [MyApp, Config]
    - title: Utilities
      contents: [parse, validate, format_output]

The Iterative Workflow

In practice, using Great Docs looks like this:

great-docs init: Run once to scaffold your configuration. Review the generated great-docs.yml and make any adjustments (reorder sections, add authors, enable CLI docs, etc.).
Edit great-docs.yml: Customize display name, add announcement banners, configure sections, set navbar gradients. This file is committed to your repository.
great-docs build: Rebuild whenever you want to see changes. This is fast and idempotent.
great-docs preview: Open the built site locally to review.
Iterate: Adjust configuration, write user guide pages, add recipes. Rebuild and preview.
great-docs setup-github-pages: When you are ready to go live, set up automated deployment. From this point on, every push to your main branch publishes updated documentation.

The great-docs.yml file and your user_guide/ directory are the only things you commit. The entire great-docs/ build directory is ephemeral and gitignored.

What’s Next

Great Docs is under active development and has shipped seven releases (v0.1 through v0.7) since its initial launch. The roadmap is quite ambitious. Near-term priorities include author attribution with GitHub-style avatars, reading time estimates, breadcrumb navigation, and an enhanced CLI reference supporting Typer, argparse, and Fire alongside Click. Further out, the roadmap covers multi-version documentation with a version selector, multi-language API documentation spanning Python, R, Rust, and JavaScript, interactive examples powered by Pyodide or JupyterLite, notebook galleries, instant SPA-like page navigation, and a plugin system for third-party extensions.

Get Started

Great Docs is open source under the MIT license and available on PyPI . To get started:

1
2
3
4
5


pip install great-docs
cd your-python-project
great-docs init
great-docs build
great-docs preview

If you have feedback, feature ideas, or run into issues, please open an issue on GitHub .

Chrome Headless Shell in Quarto

Christophe Dervieux — Tue, 14 Apr 2026 00:00:00 +0000

Quarto uses a headless browser behind the scenes to render Mermaid and Graphviz diagrams to PNG for print formats like PDF and DOCX. Until now, this meant installing Puppeteer-bundled Chromium via quarto install chromium — a setup that worked, but came with some rough edges.

Starting with Quarto 1.9, quarto install chromium is deprecated. The recommended replacement is Chrome Headless Shell , a lightweight, headless-only browser from Google’s Chrome for Testing infrastructure.

Why the switch?

Puppeteer-bundled Chromium served Quarto well, but it had limitations that kept coming up:

System dependencies in containers: The Puppeteer Chromium binary requires system libraries that aren’t always present in minimal Docker images or WSL environments. This led to cryptic errors that were hard to debug.
No arm64 Linux support: Puppeteer didn’t distribute Chromium builds for arm64 Linux, leaving users without an easy install path.
Large download size: The full Chromium bundle is significantly larger than what Quarto actually needs for headless rendering.

Chrome Headless Shell solves all three. It’s purpose-built for headless automation, has fewer system dependencies, ships arm64 Linux builds, and is smaller to download.

Installing Chrome Headless Shell

If you don’t already have Chrome or Edge installed on your system, install Chrome Headless Shell:

Terminal

1

quarto install chrome-headless-shell

Quarto will automatically detect and use any existing Chrome or Edge browser on your system. Chrome Headless Shell is the recommended fallback for environments where a full browser isn’t available — CI servers, Docker containers, and headless VMs.

Migrating from Chromium

If you previously installed Chromium via quarto install chromium, the migration is straightforward:

Terminal

1
2

quarto uninstall chromium
quarto install chrome-headless-shell

Running quarto check install will warn you if a legacy Chromium installation is detected and suggest the migration.

CI and automation

If your CI pipeline uses quarto install chromium --no-prompt, it will continue to work — the command still installs a working headless browser, but now shows a deprecation warning. Updating your scripts to quarto install chrome-headless-shell --no-prompt avoids the warning and uses the new tool directly. In Quarto 1.10, quarto install chromium will transparently redirect to Chrome Headless Shell, so either command will produce the same result.

What’s next

The transition away from Puppeteer Chromium is happening gradually. In Quarto 1.9, quarto install chromium shows a deprecation warning and quarto check install flags any legacy Chromium installation. In the upcoming Quarto 1.10, quarto install chromium will transparently redirect to Chrome Headless Shell, and installing Chrome Headless Shell will auto-remove any legacy Chromium.

If you’re still using the old Chromium install, now is a good time to switch.

Learn more on the Chrome Install documentation page.

The Chromium icon in the listing and social card image for this post is by Jeremiah via icon-icons.com. License: CC BY 4.0

RAG with raghilda TRIVIAL

Daniel Falbel — Tue, 14 Apr 2026 00:00:00 +0000

We’re happy to introduce Raghilda, a new Python package for building RAG (Retrieval-Augmented Generation) solutions.

RAG is a simple concept that comes up anytime you want to retrieve content for an LLM to improve or augment the generated output.

Without RAG, the LLM generates a response using only the user query. With RAG, relevant documents are retrieved and provided to the LLM before it generates a response.

LLMs are great at reasoning and generating text, but their knowledge is frozen at training time. They can’t access private documents, recent information, or anything that wasn’t in the training data. When asked about these topics, they either refuse to answer or — worse — hallucinate a confident-sounding response. RAG solves this by giving the model access to relevant information at query time, without needing to retrain it.

In practice, most tools built on top of LLMs already do this. ChatGPT uses web search to include recent news in its answers. Claude Code reads the codebase using tools like grep, list_files, and symbol_search before generating code. RAG is what makes LLMs useful for tasks that require specific, up-to-date, or private knowledge.

Modern LLMs support context windows of 100K tokens or more, which might seem like it makes RAG unnecessary — just paste everything into the prompt. But this doesn’t work well in practice. LLMs suffer from “lost in the middle” (sometimes called “context rot”): they pay less attention to information buried in the middle of a very long prompt, so relevant content gets missed. On top of that, sending your entire knowledge base with every query is expensive, slow, and most real-world document collections are too large to fit in a single context window anyway. Long context and RAG are complementary — a larger window lets you include more retrieved chunks, but retrieval is what gives you precision: the model sees 5 relevant paragraphs instead of 500 irrelevant pages.

raghilda

Building a good retrieval system is the hard part of RAG. raghilda is a Python framework designed to handle it. You give it URLs or file paths, and it takes care of the rest. The defaults are opinionated but transparent — every step is exposed and replaceable. You can swap the chunker, change the embedding provider, or write a custom ingestion function without fighting the framework.

Document processing. Converting HTML pages, PDFs, and DOCX files into clean text is surprisingly messy. raghilda handles this automatically, converting documents to Markdown. For websites, find_links() crawls and discovers pages so you don’t have to list them by hand.
Smart chunking. Naive fixed-size chunking can split a code block or paragraph in half. raghilda’s Markdown chunker splits text at semantic boundaries — headings, paragraphs, sentences — and preserves the heading hierarchy so each chunk retains context about where it came from.
Multiple storage backends. raghilda supports DuckDB (local, zero-config), ChromaDB, and OpenAI Vector Stores. The API is the same across backends, so you can start with a local DuckDB file and move to a hosted solution later without rewriting your code.
Hybrid retrieval. Pure vector search finds semantically similar content but misses exact keyword matches. raghilda combines semantic search, BM25 keyword matching, and attribute filters — so you can search by meaning, by keywords, and by metadata (e.g. source URL, document type, or any custom attribute) all at once.

How it works

A retrieval system has two phases: ingestion — turning your documents into a searchable store — and retrieval — finding the right chunks given a query. raghilda exposes both phases clearly, with each step exposed as an individual call you can customize or replace.

raghilda’s two phases: ingestion prepares your documents for search, retrieval finds the relevant chunks at query time.

Let’s walk through a minimal example using a Wikipedia article about Princess Ragnhild of Norway.

First, you create a store with an embedding provider. The store is where your chunks and their vector embeddings will live:

1
2
3
4
5
6
7
8


from raghilda.store import DuckDBStore
from raghilda.embedding import EmbeddingOpenAI

store = DuckDBStore.create(
    location="ragnhild.db",
    embed=EmbeddingOpenAI(),
    overwrite=True,
)

Then you read the document and chunk it. read_as_markdown() converts the URL to Markdown, and MarkdownChunker splits it into overlapping chunks at semantic boundaries:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


from raghilda.read import read_as_markdown
from raghilda.chunker import MarkdownChunker

doc = read_as_markdown(
    "https://en.wikipedia.org/wiki/Princess_Ragnhild,_Mrs._Lorentzen"
)

# We intentionally use a small chunk size for display purposes.
# In practice, chunk sizes of ~1600 characters are a good
# compromise on size versus retrieval quality.
chunker = MarkdownChunker(chunk_size=200, target_overlap=0.25)
chunked = chunker.chunk(doc)
print(f"{len(chunked.chunks)} chunks")

185 chunks

Finally, you upsert the chunked document into the store and build the search indexes. Embedding is handled by the store itself — since the embedding provider is configured at creation time, all chunks in a store are guaranteed to use consistent embeddings:

1
2


store.upsert(chunked)
store.build_index()

During retrieval, you query the store and get back the most relevant chunks. raghilda runs semantic search and BM25 keyword search, then merges the results:

1
2
3
4


chunks = store.retrieve("Did she move to Brazil?", top_k=2)
for chunk in chunks:
    print(chunk.text)
    print("---")

Lorentzen"), a member of the Lorentzen family of shipping magnates.
In the same year, they moved to Brazil, where her husband was an
industrialist and a main owner of Aracruz Celulose. She lived in
Brazil until her death 59 years later.
---
to Rio de Janeiro, where her husband had substantial business
holdings. Their residence in Brazil was originally temporary, but
they
---

Tip: In practice, you’ll usually work with many documents at once — just loop over your URLs or file paths and call upsert() for each one. See the Getting Started guide for a full walkthrough.

Using with an LLM

A retrieval system on its own just returns text chunks. To get actual answers, you connect it to an LLM. The simplest way to do this is to register a search function as a tool that the LLM can call when it needs information.

Here’s an example using chatlas :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


from chatlas import ChatOpenAI
import json

def search_ragnhild(query: str) -> str:
    """Search for information about Princess Ragnhild."""
    chunks = store.retrieve(query, top_k=10)
    return json.dumps(
        [{"text": c.text, "context": c.context} for c in chunks]
    )

chat = ChatOpenAI(
    model="gpt-4.1-mini",
    system_prompt="Answer questions about Princess Ragnhild "
    "using the search tool. Always search before answering.",
)
chat.register_tool(search_ragnhild)
_ = chat.chat("Which year did she move to Brazil?", echo="text")

Princess Ragnhild moved to Brazil in the same year she got married,
1953. Her marriage and the move to Brazil were connected, as her
husband was an industrialist and owner of business holdings in Brazil.
They settled in Rio de Janeiro and lived there until her death in 2012.

Compare this with the same question without the search tool:

1
2
3
4
5
6
7
8


chat_no_rag = ChatOpenAI(
    model="gpt-4.1-mini",
    system_prompt="Answer questions about Princess Ragnhild.",
)
_ = chat_no_rag.chat(
    "Which year did she move to Brazil?",
    echo="text",
)

Princess Ragnhild moved to Brazil in 1960.

With the search tool, the LLM retrieves the relevant chunks and grounds its answer in the actual document. Without it, the model has to rely on its training data — and may hallucinate or give a vague response.

Learn more

Getting Started — full walkthrough building a store from a documentation site
Examples — complete scripts showing RAG workflows with chatlas, ChromaDB, and more
GitHub repository — source code, issues, and contributions

Structuring Reproducible Research Projects in R: A Workflow with renv, Quarto, and GitHub

Dianyi Yang — Mon, 13 Apr 2026 00:00:00 +0000

Increasingly, academic disciplines, including the social sciences, are adopting data-science tools and calling for greater transparency and reproducibility in research. Many leading journals now require authors to share the data and code necessary to replicate published findings as a condition of publication.

Yet preparing replication materials can be daunting, especially for researchers new to data science. It is not uncommon for scholars to struggle to reproduce even their own results, due to issues such as disorganized code and data, software version mismatches, missing random seeds, or differences in operating systems and platforms. While some of these challenges are complex, many reproducibility problems stem from preventable organizational issues. Structuring code and data in a clear, consistent, and tool-friendly manner can significantly reduce these difficulties — and brings additional benefits at virtually no cost, including smoother collaboration and easier integration with AI tools.

In this post, I’ll walk through the project structure I use in my own research. It has evolved over time and reflects what I currently consider best practice for reproducible academic work. Adopting a clear structure from the outset makes it significantly easier to reproduce results, onboard collaborators, and maintain projects over the long term.

The structure covers the full research lifecycle: from data cleaning and analysis to manuscript preparation and presentation. The template is built around R and Quarto (the successor to R Markdown), but the underlying principles translate easily to Python and other publishing workflows, including LaTeX. For convenience, I’ve created a ready-to-use template on GitHub that you can clone and adapt for your own projects.

The project structure looks like this:

I’ll now break down each component in turn, explaining the purpose of the key folders and files, and the reasoning behind the design.

Git and GitHub

One major threat to reproducibility is a file named latest_final_v3_definitive.R. While this naming convention feels natural during exploratory analysis—especially under deadline pressure—it quickly becomes impossible to reconstruct what actually changed, when, and why. Future-you (and your collaborators) will not be grateful.

Version control systems like Git solve this problem by recording a structured history of changes through commits. Instead of creating new files for every iteration, you preserve a single evolving project with a transparent timeline. This makes it easy to revisit earlier versions, understand how results evolved, and collaborate without overwriting each other’s work.

GitHub builds on this by providing a shared platform for hosting repositories, reviewing changes, managing issues, and coordinating collaboration. It also makes it very clear who introduced a particular change—an accountability feature that tends to concentrate the mind when committing code.

In the example project structure, several files and folders are Git-related:

.git/ (invisible folder created by Git)
.gitignore
*.gitattributes
*/
└── .gitignore

The .git/ folder is created automatically when you initialize a repository. It contains the full version history and metadata of the project. You generally do not need to interact with it directly—just avoid modifying or deleting it.

The .gitignore file specifies which files and folders should be tracked or ignored by Git. Since all changes to tracked files are recorded in the project history, a useful rule of thumb is to track only what is necessary to reproduce your results. Files that can be regenerated from code—such as intermediate data, plots, tables, or compiled PDFs—are often better ignored to keep the repository clean and lightweight.

The .gitattributes file defines additional rules for how Git should handle specific file types. Compared with .gitignore, it appears less often in many repositories, but it is especially useful when you need file-type-specific behavior. In particular, large raw data files (e.g., large .csv files) and binary formats (such as .rds, .RData, or images) can significantly increase repository size if tracked directly. In these cases, Git Large File Storage (Git LFS) can be used to store the actual files separately, while keeping lightweight pointer files in the main repository history.

Data directories

I use two separate folders to store raw (data_raw/) and cleaned data (data_processed/). Keeping these distinct makes the workflow more transparent: the original data remains untouched, while processed data can be saved in a format that is fast to load during analysis.

This structure also aligns with common journal expectations that replication code should be able to reproduce results starting from the raw data. By preserving raw inputs and separating preprocessing steps, you make the transformation pipeline explicit rather than implicit.

data_raw/
└── *.csv (tracked through Git LFS)

data_processed/
├── *.rds (not tracked in Git)
└── .gitignore

In the template, the example raw data file (data_raw/Brexit.csv) is tracked using Git LFS. While the file itself is neither large nor binary (unlike formats such as .dta or .rds), LFS is used here for demonstration purposes. In real-world research projects, raw datasets are often substantial in size, and setting up LFS early helps avoid repository bloat later on.

The data_processed/ folder contains the cleaned dataset in .rds format, along with a .gitignore file that excludes all files in this folder (except the .gitignore file itself) from version control. This design encourages regeneration of cleaned data from raw inputs via code, rather than relying on previously saved intermediate objects. It also helps keep the repository lightweight.

The .rds format is used for several reasons. First, it is a native R format that preserves data types and attributes faithfully. Second, it encourages the use of RDS over .RData. Unlike .RData, which loads all stored objects into the global environment under their original names, .rds files require explicit assignment when loaded. This reduces the risk of unintentionally overwriting existing objects and promotes more transparent workflows.

Finally, the .gitignore file inside data_processed/ serves an additional purpose: it ensures that the folder itself is tracked by Git. Since Git does not record empty directories, this placeholder guarantees that the directory exists when the project is cloned. This is important because saving cleaned data to a non-existent directory will otherwise result in an error in R.

For researchers working with confidential or restricted data, it is often helpful to separate sensitive materials from those that can be shared publicly. In some cases, this may mean maintaining a private repository during the research phase and preparing a separate public-facing repository upon publication.

Importantly, simply deleting confidential files before making a repository public is not sufficient. Git preserves the full commit history, meaning sensitive data may still be accessible in earlier revisions. Creating a fresh repository that contains only the materials intended for public release—such as replication code and non-sensitive data—helps prevent unintended data leakage.

This approach also makes it easier to curate a clean, well-documented version of the project specifically designed for replication and reuse.

Code organization and outputs

code/
├── 1_data_cleaning.R
├── 2_descriptive.R
└── 3_main_analysis.R

outputs/
├── *.rds (not tracked in Git)
└── .gitignore

The code/ folder contains all scripts related to data processing and analysis. In the template, these are written in R, but the same structure works equally well for Python scripts (.py), notebooks (.ipynb), or other programming languages.

A key principle is to prefix scripts with numbers and descriptive names that reflect the research workflow: data cleaning, descriptive analysis, and then inferential analysis. If the logical order of steps is unclear, following the sequence in which results appear in the manuscript is often a reliable guide. This makes the analytical pipeline explicit and allows others (and future-you) to reproduce results step-by-step.

Each script should include a header section with basic metadata about its role in the project. This may include the paper title, authors, purpose of the script, input files, and output files. Making inputs and outputs explicit helps clarify dependencies and encourages scripts that transform data rather than rely on objects lingering in the global environment.

Below is an example of an analysis script (code/3_main_analysis.R) I used in the template that follows these principles:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


######## INFO ########

# PROJECT
## Paper: YOUR PAPER TITLE
## Authors: YOUR NAME

# R Script
## Purpose: This script performs linear regression analysis.
## Inputs: data_processed/Brexit.rds
## Outputs: outputs/regression_table.rds

# Setup ----

library(tidyverse)
library(here)
library(modelsummary)

i_am("code/3_main_analysis.R") # helps with relative paths

# Read in the cleaned data ----

brexit_data <- read_rds(here("data_processed/Brexit.rds"))

# Main analysis ----

model <- lm(leave ~ turnout + income + noqual, data = brexit_data)

# Output the model summary ----

reg_table <- modelsummary(model, stars = TRUE, output = "latex")
write_rds(reg_table, here("outputs/regression_table.rds"))

Two additional tips are worth noting. First, the # SECTION ---- syntax creates collapsible sections and structured outlines in RStudio and the Positron IDE. This makes longer scripts significantly easier to navigate and encourages more intentional organization of code.

Second, managing working directories is a seemingly basic task that is frequently mishandled—especially in collaborative projects. In RStudio, the recommended approach is to work within an .Rproj file, which defines a project root and ensures that relative paths behave consistently across machines. However, in many academic settings (including political science), this practice is not systematically taught. As a result, it is still common to see replication files from top journals that begin with something like setwd("~/Path/To/Project"), often commented out with the expectation that collaborators will manually adjust it.

This approach is fragile. It assumes a specific directory structure on every machine and introduces hidden dependencies on local file paths. Code that depends on setwd() is difficult to port, share, or automate.

Positron improves this situation by automatically setting the working directory to the folder opened in the IDE, encouraging a project-level workflow by default. However, many users carry over the habit of opening individual scripts rather than entire project directories.

The here package provides a robust solution that avoids reliance on the working directory altogether. By anchoring a script to the project root using here::i_am() and constructing paths with here(), file references become explicit and portable. This ensures that scripts run consistently across machines, IDEs, and collaboration environments—regardless of local directory structures.

The outputs/ folder complements this approach by providing a dedicated location for all results generated by the code. This includes intermediate objects (e.g., fitted models) and final products (e.g., tables and figures). If intermediate artifacts become numerous, they can be stored in a separate objects/ folder. The accompanying .gitignore file ensures that these generated files are not versioned, reinforcing the principle that results should be regenerated from code rather than preserved as static artifacts.

Virtual environments and dependencies (`renv`)

Many R users do not update their R or package versions regularly. In practice, the version in use is often determined by when the researcher first learned R—or when the machine was purchased, whichever is later. Deprecation warnings are politely ignored as long as the code continues to run.

The problem only becomes visible when code that worked perfectly on your old machine suddenly fails on a new machine—or worse, on a collaborator’s machine. At that point, dependency management stops being an abstract concern and becomes a very practical one.

A common solution to this problem is the use of virtual environments, which capture the exact package versions used in a project—a concept long established in the Python ecosystem. In R, the renv package provides a convenient way to create and manage project-specific libraries. Package versions remain fixed within the project, independent of updates to the global R installation or differences across collaborators’ machines.

The state of the environment is recorded in the renv.lock file, which is committed to Git. This file serves as a snapshot of the project’s dependency graph at a given point in time, including package versions and their sources. As a result, the same software environment can be reproduced on any machine with a simple call to renv::restore().

Three key components related to renv appear in the template structure:

renv/
.Rprofile
renv.lock

The renv/ folder contains the project-specific package library. Most of its contents are not tracked in Git, since the environment can be regenerated from the information stored in renv.lock. Additionally, some packages include compiled C or C++ code that is platform-specific, meaning those installed binaries are not portable across operating systems.

The .Rprofile file includes a line that automatically activates the renv environment when the project is loaded, ensuring that the correct package versions are used without manual setup.

Automatic activation, however, only works when the project is opened as a whole—for example, by opening the .Rproj file or the project folder in Positron. If individual scripts are opened in isolation, the working directory will not be set to the project root at startup, and the project-level .Rprofile will not be executed. This is yet another reason to adopt a project-level workflow rather than treating scripts as standalone files.

Quarto: manuscripts and presentations

The most visible stage of the research lifecycle is the dissemination of results—through manuscripts, presentations, and other public outputs. For quantitatively oriented researchers (which, if you have read this far, likely includes you), producing well-formatted documents that do justice to your carefully constructed tables and figures is essential.

Quarto is an open-source scientific and technical publishing system built on Pandoc. I use it for all of my manuscript writing and presentation slides, and I recommend it for researchers working in R, Python, or Julia. The template presented here is deliberately centered around Quarto, and in what follows I will briefly explain the reasoning behind that choice.

Why Quarto?

What makes Quarto stand out is its ability to run R, Python, or Julia code directly within a document, seamlessly integrating analysis and writing. Tables, figures, and results can therefore be generated and updated automatically as the underlying code changes, ensuring that the manuscript always reflects the current state of the analysis.

Quarto is not a replacement for output formats such as HTML, Microsoft Word, LaTeX, or Typst. Rather, it acts as a unifying layer that can render the same .qmd source file into multiple formats simultaneously. This flexibility allows format decisions to be postponed and adapted to collaborators, institutions, or journal requirements.

Beyond manuscripts, Quarto also supports presentation formats and full websites. Learning a single tool therefore enables the production of academic papers, conference slides, and project or personal websites within a consistent workflow.

While these capabilities may sound familiar to experienced R Markdown users, I would still argue that Quarto is worth trying—even for those already comfortable with R Markdown—for four main reasons:

Language-agnostic support: Quarto is designed to work seamlessly with multiple programming languages (R, Python, Julia). A document can be executed using the native engine for each language—for example, a Python-only document runs through a Jupyter kernel without requiring R¹, which is more friendly to non-R users.
Native support for extended features: Quarto includes built-in support for cross-referencing, citations, and advanced formatting without requiring additional packages or complex configurations. In contrast, R Markdown often relies on extensions such as bookdown to achieve similar functionality, which introduces additional dependencies. In practice, many students are taught only the basic R Markdown setup and may not be aware of these extensions. Quarto provides these features out of the box.

More scannable code cell/chunk options and syntax: R Markdown users may be familiar with setting document-wide execution options inside a setup chunk using knitr::opts_chunk$set(...), and specifying chunk options inline in a comma-separated format. While functional, this approach can become difficult to scan and maintain in larger documents.

1
2
3
4
5
6
7


```{r}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE, error = FALSE)
```

```{r barplot, fig.cap='Bar plot for y by x.', fig.height=3, fig.width=5}
# code for the bar plot
```

In Quarto, document-level execution options are defined declaratively in YAML, while cell-level options use a multi-line, command-style syntax. This makes both levels of configuration easier to scan, and typically easier to review in diffs and modify.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


```yaml
execute:
  echo: false
  warning: false
  error: false
  message: false
```

```{r}
#| label: fig-barplot
#| fig-cap: 'Bar plot for y by x.'
#| fig-height: 3
#| fig-width: 5
# code for the bar plot
```

Centralized documentation: While R Markdown benefits from a large community and extensive online resources, its documentation is distributed across multiple sites². Quarto, by contrast, maintains a single, comprehensive documentation portal that covers core usage and advanced features in one place, making it easier to navigate and learn systematically.

Quarto project structure

The remaining components of the template are primarily related to Quarto. Below, I break down the key elements and explain the reasoning behind their organization.

_extensions
.quarto (created by Quarto; not tracked in Git)
extras/
manuscript/
├── manuscript.pdf (not tracked in Git)
└── manuscript.qmd
slides/
├── slides.pdf (not tracked in Git)
└── slides.qmd
_quarto.yml

Configuration file

The _quarto.yml file serves two main purposes.

First, it defines project-level execution behavior. In particular, the following setting ensures that code is executed relative to the project root, regardless of where individual .qmd files are located:

1
2


project:
  execute-dir: project

This provides an additional layer of protection for resolving relative paths correctly. Used together with the here package, it helps ensure that file paths behave consistently across different machines and execution contexts.

Second, _quarto.yml centralizes shared configuration options so they do not need to be repeated in each individual .qmd file. This reduces duplication, minimizes the risk of inconsistencies, and improves readability across documents.

In the template, shared options include the bibliography file and citation style:

1
2


bibliography: extras/references.bib
csl: extras/apa.csl

Because these settings are defined at the project level, they apply automatically to both the manuscript and presentation slides.

Extras: bibliography and citation styles

As mentioned above, the extras/ folder contains the bibliography file (references.bib) and citation style file (apa.csl). These are referenced in the _quarto.yml configuration file, which means they are automatically available to all .qmd files in the project without needing to specify them individually.

extras/
├── references.bib
└── apa.csl

Ideally, if you have other supplementary materials that are not part of the core code or data but are still relevant to the project (e.g., codebooks), they could also be stored in this folder. However, I have kept it focused on bibliography-related files for simplicity.

Bibliography file

The references.bib file is a standard BibTeX/BibLaTeX bibliography file familiar to LaTeX users. It contains structured reference entries, including fields such as author, title, journal, year, and other publication metadata.

Entries can be exported from reference managers such as Zotero or Mendeley, or generated directly within Quarto using the Visual Editor. Crucially, the bibliography file is kept separate from both the manuscript and the citation style. This separation allows the same reference database to be shared across manuscripts and slides, while making it easy to change formatting styles without modifying the source content.

Citation style file

The apa.csl file is a Citation Style Language (CSL) file that defines how citations and bibliography entries are formatted. It acts as a translation layer between the structured data in references.bib and the rendered output format.

In this template, the CSL file specifies APA style, which is common in the social sciences. However, switching styles is straightforward: replace the apa.csl file with another CSL file (e.g., Chicago, MLA, or a journal-specific style) and update the reference in _quarto.yml. No changes to the manuscript text are required.

To find a CSL file for a specific discipline or journal, you can browse the CSL Style Repository , which contains thousands of maintained styles.

Manuscript

Dissemination of research findings is the ultimate goal of the research process, and the manuscript remains its primary vehicle.

manuscript/
├── manuscript.pdf (not tracked in Git)
└── manuscript.qmd
_extensions/
└── kv9898/
    └── orcid/

The manuscript/ folder contains the Quarto source file (manuscript.qmd) and the compiled PDF output (manuscript.pdf). As with other generated artifacts, the PDF is excluded from version control because it can always be regenerated from the source document.

The example manuscript.qmd provides a minimal template illustrating a typical academic structure, including numbered sections, figures, tables, cross-references, and citations. The template is intentionally simple but can be extended with additional formatting and structural elements as needed.

While Quarto’s built-in PDF format supports core elements such as title, authors, date, and abstract, more specialized academic requirements—such as detailed affiliation formatting, ORCID display, keywords, custom headers, or journal-style front matter—often require additional customization.

To address this, I created a custom extension that builds on the default PDF format and adds features commonly required in academic manuscripts. The extension is included in the template under _extensions/kv9898/orcid/.

Packaging the manuscript format as a Quarto extension ensures that formatting logic is versioned and shared alongside the project, rather than maintained as ad hoc local tweaks. The extension is activated via the YAML front matter of manuscript.qmd:

1
2


format:
  orcid-pdf:

The resulting PDF output more closely resembles a conventional academic paper, with structured author information, affiliations, ORCID identifiers, and keywords clearly presented. Because the template is implemented as a Quarto extension, it remains portable and reusable across projects, and can be further modified, particularly by those comfortable with LaTeX, to accommodate journal-specific formatting requirements.

Slides

Researchers often need to present their findings at conferences, seminars, or in teaching settings. Quarto supports multiple presentation formats, including Reveal.js, Beamer, and PowerPoint. In this template, I use Beamer to produce PDF slides, which are widely accepted in academic contexts and easy to share or print.

slides/
├── slides.pdf (not tracked in Git)
└── slides.qmd

The slides/ folder contains the Quarto source file (slides.qmd) and the compiled PDF output (slides.pdf). As with the manuscript, the PDF is treated as a generated artifact and excluded from version control.

The resulting slides are indistinguishable from those produced using a traditional LaTeX Beamer workflow. The key difference is that all content—including code, tables, and figures—can be generated directly from the same analytical pipeline.

In this example, I deliberately reuse the same output objects (e.g., tables and figures) in both the manuscript and the slides. This guarantees consistency across formats and eliminates the risk of discrepancies between what appears in the paper and what is presented publicly.

The example slides also demonstrate practical considerations such as resizing tables and figures to fit slide layouts. They reference the same shared bibliography file used in the manuscript, ensuring consistent citation formatting across outputs. Although only a single reference is included in the example, the template is configured to allow references to span multiple frames, accommodating a realistic bibliography.

README file

README.md

Following both GitHub conventions and academic best practice, every project should include a README.md file. This file serves as the primary entry point for others who wish to understand, reproduce, or build upon the research.

In the template, the README provides:

A brief project description
Instructions for setting up the environment
Steps to reproduce the analysis, manuscript, and slides

Additional placeholders are included for information such as the machine model and operating system used during development. While not always necessary, this metadata can be helpful when troubleshooting platform-specific issues, particularly for projects involving compiled dependencies. The README also notes the approximate time required to run the full analysis and render outputs, which helps set realistic expectations for replication.

Notably, the template does not include a LICENSE file by default. This is intentional. The appropriate license for academic code and data depends on disciplinary norms, institutional policies, journal requirements, and the researcher’s intended level of openness. Common choices include MIT or GPL licenses for code, and Creative Commons licenses for data. In some cases, more restrictive or custom licenses may be appropriate. Researchers should select a license deliberately, ensuring it aligns with their sharing goals and complies with relevant policies.

GitHub as infrastructure — not just hosting

Once a project is structured clearly and pushed to GitHub, it becomes more than a collection of files. It becomes infrastructure.

A well-organized repository makes collaboration dramatically smoother. Issues can serve as lightweight meeting minutes, evolving naturally into task lists. They can be assigned to specific contributors, grouped into milestones, and tracked over time. Pull requests and branching strategies help keep the main branch stable while allowing experimentation and iterative refinement. Code reviews become part of the workflow rather than an afterthought.

These practices, borrowed from software development, translate surprisingly well into academic collaboration. Instead of emailing attachments back and forth, collaborators work against a shared, versioned source of truth.

A clear project structure also makes modern AI tools significantly more useful. When your data, scripts, outputs, and manuscripts are logically organized, AI assistants in VS Code, Positron, or GitHub can reason about your project more effectively. They can trace how tables were generated, suggest improvements to analysis code, help refine writing based on the underlying results, or flag inconsistencies between figures and text. In other words, organization enables context — and context is what makes AI assistance meaningful rather than superficial.

There are also practical benefits. Once your work is version-controlled and backed up remotely, you no longer fear data loss due to a failed hard drive, a stolen laptop, or accidental overwrites. The repository itself becomes a durable record of the project’s evolution.

Perhaps most importantly, a well-structured project reduces the asymmetry of knowledge among collaborators. Instead of each co-author being familiar with only one portion of the workflow, everyone can develop a holistic understanding of how the project fits together — from raw data to final manuscript. This makes feedback more constructive, collaboration more efficient, and the research process more transparent.

Reproducibility, then, is not merely about satisfying journal requirements. It is about building research projects that are resilient, collaborative, and adaptable — projects that scale not only across machines, but across people.

Conclusion

At its core, none of the tools discussed here—Git, renv, Quarto, or GitHub—are revolutionary on their own. What matters is how they are combined into a coherent project structure. Once that structure becomes habitual, reproducibility stops being an afterthought and becomes the default.

Adopting this workflow does not require perfect foresight or advanced technical expertise. It simply requires deciding, from the outset, that clarity, versioning, and regeneration will guide the project. The payoff is substantial: fewer replication headaches, smoother collaboration, better integration with modern tooling, and greater confidence in the durability of your work.

In the long run, a well-structured project is not just easier to reproduce—it is easier to think with.

When R and Python are combined within the same document, however, Quarto uses reticulate under the hood, similar to R Markdown. ↩︎
For example, see R Markdown official documentation for the core features, and Yihui’s personal site for more advanced features. ↩︎

April Release Highlights

Cindy Tong — Tue, 07 Apr 2026 00:00:00 +0000

Tip

Subscribe to get this newsletter directly in your email inbox.

Welcome to the first edition of our Positron newsletter! Here, we will share highlights from our latest release, tips on how to be more productive with Positron, and useful resources.

We just returned from an in-person onsite in beautiful Monterey, California. During the trip, we got a chance to meet (some of us for the first time), touch grass and sand, and brainstorm ways we can improve to build better products for you.

Let’s get into the updates.

Key Product Updates

The April 2026 release of Positron brings significant improvements across:

Positron Server for Academic Use via JupyterHub
AI enhancements : Next Steps in Jupyter Notebooks, Agent Skills, and Azure AI Foundry Support
Telemetry updates
R improvements : Addins, Debugging, and more
Data Explorer Performance Improvement
Windows ARM in GA
What’s Coming Next : Inline Outputs, Packages Pane, and Posit Assistant

Here’s a look at the key features that shipped with the April 2026 release.

Positron Server for Academic Use via JupyterHub

What we built: Academic institutions can now offer Positron Server to their students at no cost through JupyterHub ( blog post ). If your institution already runs JupyterHub, you can add Positron as a launcher option alongside JupyterLab, with no additional infrastructure required. Students simply log in and select Positron from the launcher, getting the full Positron experience including rich Python and R support, the extension marketplace, and (optionally) Positron Assistant.

Why this matters: This removes the barrier for students and educators who want to use Positron in a classroom setting. No local installs, no configuration headaches — just a familiar JupyterHub login with Positron ready to go.

Get started: Review the eligibility criteria and send an email to academic-licenses@posit.co to request a free teaching license.

AI Next Steps in the Native Jupyter Notebook Editor

What we built: AI Next Steps uses the Positron Assistant to analyze your current cell output and suggest a logical next step in a “ghost cell” at the bottom of your notebook. If you just loaded a CSV, it might suggest data cleaning steps or a visualization, without you needing to open a chat pane or write a prompt. Suggestions stay aligned with the notebook’s live kernel state, updating as your code and outputs change.

Why this matters: The design came out of interviews with data scientists who kept telling us the same thing: switching to a chat pane mid-analysis breaks their concentration. AI Next Steps sits at the bottom of your notebook and updates as your outputs change. You just run a cell, and if there’s a logical next step, it surfaces, with no prompt required.

Get started: Enable the feature by setting positron.assistant.notebook.ghostCellSuggestions.enabled to true in your settings. When you run a cell, look for the ghost cell suggestion at the bottom of the notebook, accept, reject, or hide it.

Agent Skills in Positron Assistant

What we built: Agent skills — reusable, structured capabilities that extend what agents can do in agent.md files — are now integrated into Positron ( #11753 ). Skills let agents execute multi-step workflows like “profile this dataset and suggest cleaning steps” or “run this test suite and summarize failures,” so you define a task once and reuse it across sessions and projects.

Why this matters: Skills make agents composable building blocks rather than one-off chat interactions. Instead of re-explaining a complex workflow every time, you codify it as a skill that any team member can use.

Get started: Open the chat gear icon and select Skills, or run Chat: Configure Skills from the Command Palette.

Positron Assistant Now Supports Microsoft Foundry as a Provider

What we built: Positron Assistant now supports Microsoft Foundry as a model provider ( #8583 ) with API key-based access via a custom base URL.

Why this matters: If your team runs on Azure and uses LLMs through Foundry, you can now use Positron Assistant with them.

Get Started: In Positron Assistant’s provider settings, set positron.assistant.provider.msFoundry.enable to true to select Microsoft Foundry as a provider. You can authenticate with an API key and your Foundry endpoint URL.

Telemetry Update: Anonymous Session Identifiers

What we changed: Positron now generates an anonymous, random session identifier to help us understand usage patterns like session frequency and retention across releases. This identifier contains no personal information, account data, or workspace content; it’s a cryptographically random UUID that cannot be linked to any other identifiers, including the identifier that VS Code uses for telemetry.

Why we’re doing this: As a free, source available project, we don’t have traditional product analytics. Understanding whether people come back, how often they use Positron, and whether releases improve or regress the experience helps us prioritize the right work to build a better experience for you.

You can opt out by updating your settings outlined here , or you can reset the anonymous identifier with the command Preferences: Reset Anonymous Telemetry ID. If you’ve opted out of product updates, no session identifier is generated or sent.

RStudio Addins Support

What we built: Positron now supports running RStudio addins from R packages. If a package registers an addin (like styler, reprex, clipr, or shinyuieditor), you can run it directly from Positron ( #1313 ).

Why this matters: This was one of our most upvoted issues this release (25 👍). Many R users rely on addins as part of their daily workflow for code formatting, generating reproducible examples, or launching Shiny tools.

Get started: Open the Command Palette (Ctrl-Shift-P (windows), Ctrl-Shift-P (linux), Command-Shift-P (mac)) and search for Run RStudio Addin. You’ll see a quick pick with all available addins from your installed packages.

R Debugger & Workflow Improvements

What we built: The R debugger received a suite of improvements this release. In addition to conditional breakpoints, hit count breakpoints, and log breakpoints ( #12360 ), the debugger now supports error and warning breakpoints ( #11797 ), the ability to pause R at any time ( #11799 ), Watch Pane support ( #1765 ), and synchronization between the Console and Variables pane with the selected call stack frame ( #3078 and #12131 ).

Why this matters: Advanced debugging in R has traditionally meant scattering if (...) browser() calls through your code or setting options(error = recover) by hand. These new features put Positron’s R debugger on par with what you’d expect from any modern language:

Conditional, hit count, and log breakpoints let you control exactly when breakpoints fire and print diagnostic info, all without touching your source code.
Error and warning breakpoints drop you into the debugger the moment an error or warning is emitted, so you can inspect the state that caused it.
Pause R at any time. If R is stuck in a long computation or an infinite loop, you can drop into the debugger mid-execution, look around, and resume by clicking Continue.
Watch Pane lets you track expressions across debug steps. Prefix an expression with /print to see R’s printed output (hover to get full output) instead of a structured variable.
Synchronization with the call stack. Click any frame in the Call Stack view and the Console, completions, and Variables pane all switch to that frame’s environment. The Console synchronization is like recover(), but built into the IDE.

Get started: Set a breakpoint in any R file, then right-click it and choose Edit Breakpoint. Select “Expression” to add a condition (e.g., i > 100), “Hit Count” to break after N hits, or “Log Message” to print a message without pausing. For error and warning breakpoints, open the Breakpoints pane and enable them there. To pause R while code is running, use the command Debug: Pause or check the Interrupt breakpoint option in the Breakpoints pane. While debugging, add expressions in the Watch section of the debug sidebar and click on frames in the Call Stack to navigate environments.

Data Explorer: Faster with Multiple DataFrames

What we built: We fixed two long-standing performance issues in the Data Explorer. Background Data Explorer tabs no longer trigger backend recomputation, and the summary panel no longer recalculates summary statistics for large DataFrames on every cell execution ( #4279 and #2795 ).

Why this matters: If you work with multiple DataFrames open, you may have noticed lag as Positron recomputed statistics for tabs you weren’t even looking at. That’s gone now.

Get started: Nothing to configure. When you open multiple DataFrames in the Data Explorer and switch between them, you should notice snappier performance, especially with large datasets.

Windows ARM Is Generally Available

What we built: We started creating experimental builds for Windows ARM several months ago, and our early users have had good experiences with them. This release, we promoted the Windows ARM builds from experimental to stable and they are now available through all standard installation channels ( #12207 ).

Why this matters: ARM-based devices are increasingly common for Windows users, whether you’re a student or a professional. GA support means these users get the same Positron experience, including Quarto with R and Python support, without needing workarounds or experimental builds. Do be aware that the Windows ARM build bundles the non-ARM version of Quarto, which runs under emulation.

Get started: Install Positron on your ARM-based Windows device through standard installation channels .

View all issues in the 2026.04.0 Release milestone .

What’s Coming Next

We are currently building the following features and we’d love your feedback. Please share on GitHub . These early alpha features with some rough edges are available for testing by enabling their respective settings.

Inline Outputs for Quarto and R Markdown Files

This was the second most upvoted issue we have ever, ever had! We just completed an initial run to allow displaying inline outputs within Quarto and R Markdown files ( #5640 ), and it is available for early testing. Note that this experimental version, while it does get the basics into Positron, does not have support for many popular RStudio features. You can opt in to the experimental feature using the positron.quarto.inlineOutput.enabled setting.

Packages Pane for Managing Environments

We are currently building out a new Packages pane that will allow you to install, update, and uninstall packages without leaving your workspace or needing to use the terminal ( #11214 ). We’d love to hear your feedback on this discussion thread .

Events and Resources

Explore Positron’s Video Walkthroughs on YouTube

We hosted a walkthrough of exploring GitHub data in a Jupyter Notebook and converting this into an interactive Shiny app with AI. Catch up on the recording or explore more Positron videos .

Registration for posit::conf(2026) Is Now Open!

Registration is officially open for posit::conf(2026)! Join the global data community in Houston or tune in online from September 14–16. Register today!

How We Chose a Python Type Checker

Ever wondered about the decision making process behind how we chose which Python type checker to bundle in Positron? Check out Austin Dickey’s blog post walking through his research and decision making process.

Community Affirmations

Thank you all for your support, ideas and engagement. We’re building Positron in the open because the best ideas come from the people using it. If there’s a feature you’d love to see, open an issue or upvote an existing one, it genuinely shapes what we work on next.

Have a great April!

Positron Team

Positron Server available for academic use via JupyterHub

Isabel Zimmerman — Mon, 06 Apr 2026 00:00:00 +0000

Academic institutions can now offer Positron directly within their existing JupyterHub environments, giving students a robust data science IDE without needing a local install or new infrastructure. With a free teaching license, institutions can provide Positron Server to currently enrolled students for use in coursework. This makes it easy to deliver a consistent, fully featured data science environment to students without requiring local installation or setup.

Students can launch Positron the same way they would open JupyterLab or a notebook. Just select it from the JupyterHub launcher and start working.

Once launched, Positron provides the full IDE experience, including:

Rich Python and R support
Access to the OpenVSX extension marketplace
Built in data viewer and variables explorer
Integrated help pane, debugger, version control and other features to help students level up when they’re ready

How it works

Positron Server is designed to integrate directly with existing JupyterHub deployments. It’s compatible with JupyterHub environments running JupyterLab 4 and Python 3.9+.

It’s installed via the jupyter-positron-server Python package , built on Jupyter Server Proxy. If you’ve configured similar services before, setup will feel familiar. This is not a standalone desktop install. Rather, it lets you bring Positron into an existing JupyterHub setup.

Who can use it?

This offering is available to academic institutions using Positron for teaching. Under a free license, institutions can provide access to enrolled students, course participants, or staff involved in the delivery or receipt of educational programming.

Full eligibility details are available in the Positron Education License Rider .

Getting started

Hosting Positron for teaching purposes requires a free license key. To get set up:

Review the eligibility criteria in the Positron Education License Rider .
Email academic-licenses@posit.co to request a teaching license.
Once your license is confirmed, follow the jupyter-positron-server documentation to complete setup in your JupyterHub environment.

Get in touch

Have questions or want to learn more?

Reach out to academic-licenses@posit.co and let us know you’re interested in Positron. We’ll help you navigate next steps!

What's next: Quarto 2

Carlos Scheidegger — Mon, 06 Apr 2026 00:00:00 +0000

We’re excited to share an early look at Quarto 2. You might be aware that we recently released Quarto 1.9 , with support for long-standing requests such as PDF accessibility. Quarto is an excellent choice for authors of scientific and technical documents, and the amount and quality of the work you create with it is genuinely humbling for us. Before anything else, we want to thank you for using Quarto; you’re all quite literally the reason we build it.

Quarto 2 is a full rewrite of the Quarto CLI, written from the ground up in Rust to better support your existing use cases, and enable a number of new, exciting use cases. Most importantly, Quarto 2 will include a built-in collaborative editor, and we plan on adding support for collaborative writing in Posit’s commercial products such as Posit Cloud, Connect, and Workbench. With that said, the design of those integrations is still taking shape.

It is also very early in the project. If you interact with the Quarto project solely as a user of the tool, nothing in your workflow will change, and you should proceed as if you didn’t know about our plans for Quarto 2. We don’t expect to have a public release of Quarto 2 for at least 6 months. In addition, we will continue to develop and maintain parallel versions until Quarto 2 is a suitable replacement for users of Quarto 1.

Just like Quarto 1, Quarto 2 is open source and MIT licensed. The GitHub repository for Quarto 2 is currently quarto-dev/q2 .

Why Quarto 2?

There are some fundamental pain points in Quarto 1 that can’t be solved incrementally. The goal of Quarto 2 is not to change how you currently work with Quarto; instead, we’ve arrived at a point where incremental improvements do not provide the value you deserve given our team size and constraints. These are some of the things we want to do in Quarto 2:

A new Markdown parser enables tighter integration with editors for the entire rendering pipeline. We know that good error messages, autocompletion, and YAML validation are some of your favorite features in Quarto 1. Quarto has about 1,000 different YAML configuration options, and we know how important it is to be able to provide good error messages. We want to extend this same idea to everything in your Quarto project: Markdown syntax errors, Lua filter errors, broken links, etc. Whenever possible, these should be flagged in your editor of choice.
A fundamental solution for long-standing performance problems. Quarto 1 is built by integrating a number of tools that work very well in isolation, but aren’t designed to be performant when used together. A full rewrite of the Quarto core functionality in a single programming language will enable us to provide much better performance than before.
A collaborative editor. Quarto 2 will ship with a collaborative editor designed to work directly on the web as well as on the command-line. Keeping in the tradition and ethos of the Quarto project, this will include a robust open-source foundation based on automerge , as well as a commercial solution for hosted project management. This follows the relationship between Quarto 1 and its integration with other Posit commercial offerings.
A visual editor that works well alongside a source editor. The visual editor we ship in RStudio, VS Code, and Positron works well if everyone working on the document is using the visual editor. On the other hand, if you choose the visual editor, but your colleague chooses the source editor, then you’ll find that the experience is full of sharp edges. Quarto 2 is built from the ground up to support bidirectional editing workflows. A small change in your document using the visual editor shouldn’t cause a large change in the .qmd file that is disruptive for your colleagues using a source editor.
Support for Quarto 1 projects. We aim for Quarto 2 to be backwards compatible with Quarto 1. Concretely, we’re aiming to incorporate our Quarto 1 test suite directly into Quarto 2’s project, including support for Pandoc and its output formats that our community depends on. Your existing extensions and projects should just work in Quarto 2. Early on, there will be gaps, and Quarto 2 will initially be a better fit for new projects.

What happens to Quarto 1 development?

It’s not going anywhere, and will be in active development for at least the next year. We’ll still provide bugfixes, and accept pull requests.

Current status

The development is happening in a separate GitHub repository . Feel free to look around! However, this code base isn’t ready for public consumption, and is very much in flux: that means we’re not going to spend a lot of time answering architectural questions about it until things have settled, and all discussion of Quarto should remain in our current discussion forum and issue tracker.

There are big, interesting changes in the Quarto 2 architecture, and they deserve a longer exposition. We are working on those documents right now, and will share them with you in the next few weeks. Stay tuned!

Shiny for Python 1.6 brings toolbars and OpenTelemetry

Liz Nelson — Thu, 02 Apr 2026 00:00:00 +0000

We’re pleased to announce that Shiny for Python v1.6 is now available on PyPI !

Install it now with pip install -U shiny.

This release has two big additions: toolbar components for building compact, modern UIs, and OpenTelemetry support for understanding how your apps behave in production. A full list of changes is available in the CHANGELOG .

Toolbars

Toolbars are a new set of compact components designed to fit controls into tight spaces — card headers and footers, input labels, and text areas. They’re perfect for dashboards that are running out of room, or for AI chat interfaces where you want to add controls without cluttering the layout.

The core components are:

Component	Description
`ui.toolbar()`	Container for toolbar inputs
`ui.toolbar_input_button()`	A small action button
`ui.toolbar_input_select()`	A compact dropdown select
`ui.toolbar_divider()`	A visual separator
`ui.toolbar_spacer()`	Pushes items to opposite sides

Each input also has a corresponding ui.update_toolbar_input_*() function for updating it dynamically.

Toolbars in card headers and footers

The most common use case is placing a toolbar in a card header to attach controls directly to a card’s content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


from faicons import icon_svg
from shiny.express import input, render, ui

with ui.card(full_screen=True):
    with ui.card_header():
        "Header"
        with ui.toolbar(align="right"):
            ui.toolbar_input_button(
                id="action1",
                label="Refresh",
                icon=icon_svg("arrows-rotate"),
            )
            ui.toolbar_divider()
            ui.toolbar_input_select(
                id="options",
                label="Filter",
                choices=["ABC", "CDE", "EFG"],
            )

    @render.text
    def toolbar_status():
        return f"Button clicks: {input.action1()}, Selected: {input.options()}"

Toolbars in input labels

You can also pass a toolbar as an input’s label to add an info button for additional information or provide quick actions, like resetting an input value.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39


from faicons import icon_svg
from shiny.express import ui

with ui.card():
    ui.card_header("Data Settings")
    ui.input_slider(
        "threshold",
        label=ui.toolbar(
            ui.toolbar_input_button(
                "threshold_info",
                label="About this setting",
                icon=icon_svg("circle-info"),
                tooltip="Standard deviations from the mean before a value is flagged as an outlier.",
            ),
            "Outlier threshold",
            align="left",
        ),
        min=1,
        max=5,
        value=2,
        step=0.5,
    )
    ui.input_numeric(
        "sample_size",
        label=ui.toolbar(
            ui.toolbar_input_button(
                "sample_info",
                label="About this setting",
                icon=icon_svg("circle-info"),
                tooltip="Number of observations to draw from the dataset for each analysis run.",
            ),
            "Sample size",
            align="left",
        ),
        value=100,
        min=10,
        max=1000,
        step=10,
    )

Toolbars in text areas

The input_submit_textarea() component accepts a toolbar parameter directly, making it easy to add contextual controls for AI chat interfaces and message composers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64


from faicons import icon_svg
from shiny import reactive
from shiny.express import input, render, ui

ui.page_opts(fillable=False)

messages = reactive.value([])

with ui.card(full_screen=True, height="250px"):
    ui.card_header("Message Composer")
    with ui.card_body():
        ui.input_submit_textarea(
            "message",
            label="Message",
            placeholder="Compose your message...",
            rows=4,
            toolbar=ui.toolbar(
                ui.toolbar_input_select(
                    "priority",
                    label="Priority",
                    choices=["Low", "Medium", "High"],
                    selected="Medium",
                    icon=icon_svg("flag"),
                ),
                ui.toolbar_divider(),
                ui.toolbar_input_button(
                    "attach",
                    label="Attach",
                    icon=icon_svg("paperclip"),
                ),
                align="right",
            ),
        )

with ui.card(full_screen=True, height="250px"):
    ui.card_header("Sent Messages")

    with ui.card_body():
        @render.ui
        def messages_output():
            msg_list = messages.get()
            if not msg_list:
                return ui.p("No messages sent yet.", style="color: #888;")

            return ui.div(
                *[
                    ui.p(
                        f"[{msg['priority']}] {msg['text']}",
                        style="margin: 4px 0;",
                    )
                    for msg in reversed(msg_list)
                ]
            )

@reactive.effect
@reactive.event(input.message)
def _():
    message_text = input.message()
    if message_text and message_text.strip():
        current_messages = list(messages.get())
        current_messages.append(
            {"text": message_text, "priority": input.priority()}
        )
        messages.set(current_messages)

Toolbars are available in py-shiny and forthcoming in bslib for R. For a complete walkthrough with full app examples, see the Toolbar component page .

OpenTelemetry

Starting with Shiny v1.6.0, OpenTelemetry support is built directly into the framework.

OpenTelemetry (OTel) is a vendor-neutral observability standard that lets you collect telemetry data — traces, logs, and metrics — and send it to any compatible backend. For Shiny apps, this means you can finally answer questions like:

Why is my app slow for certain users?
Which reactive expressions are taking the most time?
How long does it take for outputs to render?
What sequence of events occurs when a user interacts with my app?

Getting started

The fastest way to get started is with Pydantic Logfire , which provides zero-configuration OTel setup:

1
2


pip install logfire
logfire auth

Then set an environment variable to tell Shiny what level of tracing to collect:

1

export SHINY_OTEL_COLLECT=reactivity

That’s it — no changes to your app code required. Run your app and visit logfire.pydantic.dev to see traces.

OTel is great for GenAI apps

Shiny’s OTel integration pairs especially well with Generative AI applications. When a user reports that your chatbot feels slow, traces make it easy to pinpoint whether the delay is in the AI model request, streaming, tool execution, or a downstream reactive calculation.

The image below shows a trace from a weather forecast app powered by a Generative AI model. A single user session is captured in full detail:

Collection levels

SHINY_OTEL_COLLECT accepts three levels of detail:

"none" - No Shiny OpenTelemetry tracing
"session" - Track session start and end
"reactive_update" - Track reactive updates (includes "session" tracing)
"reactivity" - Trace all reactive expressions (includes "reactive_update" tracing)
"all" [Default] - Everything (currently equivalent to “reactivity”)

What gets traced automatically

Shiny automatically creates spans for all of the following — no manual instrumentation needed:

Session lifecycle: When sessions start and end, including HTTP request details
Reactive updates: The entire cascade of reactive calculations triggered by an input change or a new output to be rendered
Reactive expressions: Individual calculations such as @reactive.calc, @reactive.effect, @render.*, and other reactive constructs

Works with any OTel backend

Logfire is our recommended starting point, but Shiny’s OTel integration is fully vendor-neutral. You can send traces to Jaeger , Zipkin , Grafana Cloud , Langfuse , or any other OTLP-compatible backend.

For local debugging without a backend, install the OpenTelemetry SDK and use the console exporter:

1

pip install "shiny[otel]"

Full documentation — including custom spans, database instrumentation, and production considerations — is available in the OpenTelemetry guide .

In closing

We’re excited to bring you these new features in Shiny v1.6. As always, if you have questions or feedback, join us on Discord or open an issue on GitHub . Happy Shiny-ing!

How we chose Positron's Python type checker

Austin Dickey — Tue, 31 Mar 2026 00:00:00 +0000

The open-source Python type checker and language server ecosystem has exploded. Over the past year or two, four language server extensions have appeared, each with a different take on what Python type checking should look like. We evaluated each of them to decide which one to bundle with Positron to enhance the Python data science experience.

Background

The Language Server Protocol (LSP) is a cross-language, cross-IDE specification that allows different IDE extensions to contribute smart features like tab completions, hover info, and more. The four¹ Python extensions in this post are powered by type checkers, which are Python-specific tools that catch bugs in your code before runtime by guessing and checking the types of your variables. They do this by statically analyzing your code before you run it.

Tip

Positron’s built-in language server uses your running Python session to provide runtime-aware completions and hover previews too! Beyond what’s in code, it knows your DataFrame column names, your dictionary keys, your environment variables, and more. But the tools evaluated in this post handle the static analysis side: type checking, go-to-definition, rename, and code actions. Both run concurrently, and Positron merges their results.

With AI tools writing more of your code, a good language server helps you read and navigate code you didn’t write. LLM-generated code also introduces bugs that type checkers catch before you run anything. For data scientists, who rely on code to be the reproducibility layer, and who can’t automate away human judgment, what matters is a tool that helps you understand and trust your code.

We did this evaluation in November 2025 but have refreshed the data in this post at the time of publish.

The contenders

Tool	Backing	Language	License	Stars
Pyrefly	Meta	Rust	MIT	5.5K
ty	Astral (OpenAI)	Rust	MIT	17.8K
Basedpyright	Community	TypeScript	MIT	3.2K
Zuban	Indie	Rust	AGPL-3.0	1K

Pyrefly is Meta’s successor to Pyre. It takes a fast, aggressive approach to type inference, being able to catch issues even in code with no type annotations. It reached beta status in November 2025.

ty is from Astral, the team behind uv and ruff. OpenAI announced its acquisition of Astral recently; Astral has stated that ty, ruff, and uv will remain open source and MIT-licensed. It’s the newest project, with a focus on speed and tight integration with the Astral toolchain. It reached beta status in December 2025 and follows a “gradual guarantee” philosophy (more on that below).

Basedpyright is a community fork of Microsoft’s Pyright type checker, with additional type-checking rules and LSP features baked in. It’s the most mature of the four and has the largest contributor base.

Zuban is from David Halter, the author of Jedi (the longtime Python autocompletion library). It aims for mypy compatibility and ships as a pip-installable tool.

What we tested

We tested each language server across several dimensions, roughly following the rubric we outlined publicly :

Feature completeness: Completions, hover, go-to-definition, rename, code actions, diagnostics, inlay hints, call hierarchy
Correctness: How well does the type checker handle real-world Python code?
Performance: Startup time and time to first completion
Ecosystem: License, community health, development velocity, production readiness

We tested inside Positron with a mix of data science and general Python code.

Feature completeness

Here are some screenshots of hovers, tab-completions, and diagnostics from each extension:

Pyrefly
ty
Basedpyright
Zuban

All four provide the core features you’d expect: completions, hover documentation, go-to-definition, semantic highlighting, and diagnostics. The differences show up in the details.

Pyrefly

Strong feature set. The hover documentation is the best of the four; Pyrefly renders it cleanly and sometimes includes hyperlinks to class definitions.

ty

Fast and clean, now in beta. The completion details can sometimes feel a little overwhelming, but can help when expanded.

Basedpyright

Handles type checking comprehensively well. The main friction point: it surfaces a lot of warnings out of the box. If you’re doing exploratory data science, a wall of type errors on your first pandas import can feel hostile. You can tune this down, but the defaults are oriented toward stricter use cases like package development.

Zuban

The least mature of the four so far. Installation requires a two-step process (pip install zuban, then configure the interpreter), and the analysis is tied to that specific Python installation on saved files only. Third-party library completions only work when stubs are available, not from installed packages. Symbol renaming once broke standard library code in our testing.

Type checking philosophy

The bigger difference between these tools isn’t features but how they think about type checking.

Gradual guarantee vs. aggressive inference

ty follows what’s called the gradual guarantee: removing a type annotation from correct code should never introduce a type error. The idea is that type checking should be additive. You opt in by adding types, and the checker only flags things it’s sure about.

The other extensions take the opposite approach. They always infer types from your code, even when you haven’t written any annotations. This means they can catch bugs in completely untyped code, but it also means they may flag code that runs perfectly fine.

For example:

1
2
3
4
5
6
7


my_list = [1, 2, 3]
my_list.append("foo")

# Pyrefly: bad-argument-type
# ty: 
# Basedpyright: reportArgumentType
# Zuban: arg-type

Pyrefly infers my_list as list[int] and flags the append("foo") call as a type error. ty sees no annotations and stays silent. The code is dynamically typed and that’s fine.

If you’re doing exploratory data analysis and don’t want to annotate everything, ty’s restraint might be more comfortable. But if you’re writing a library and want to catch bugs early, Pyrefly’s aggressiveness is helpful. For example:

1
2
3
4
5
6
7
8
9


def process(data):
    return str(data)

process(42) + 1  # Raises a runtime AttributeError

# Pyrefly: unsupported-operation
# ty: 
# Basedpyright: reportOperatorIssue
# Zuban: operator

Basedpyright and Zuban land somewhere in between, with Basedpyright leaning toward stricter checking and Zuban aiming for mypy compatibility. Each of these extensions has the ability to suppress certain diagnostics you actually see when typing if you wish.

For a deeper dive on this topic, Edward Li’s comparison of Pyrefly and ty and Rob Hand’s overview of future Python type checkers are both worth reading, though some bugs have been fixed since they were published.

Performance

We measured startup time (how long until the language server responds to an initialize request) and time to first completion (how long a textDocument/completion request takes after initialization) in a relatively small repository. We ran each measurement five times and averaged. As always, these results only represent our computer’s experimental setup.

LSP	Avg. startup (s)	Avg. first completion (ms)
Pyrefly	5.8	190
ty	2.2	88
Basedpyright	3.1	112
Zuban	N/A²	97

ty was the fastest across the board. But the practical differences are small: a 3-second difference in startup happens once per session, and a 100ms difference in completions is imperceptible. All four are fast enough that differences are negligible for daily use.

Ecosystem health

We also looked at each project’s development velocity and community health metrics. A language server you rely on daily needs to keep up with Python’s evolution.

	Pyrefly	ty	Basedpyright	Zuban
GitHub stars	5.5K	17.8K	3.2K	1K
Contributors	162	186³	82	17
License	MIT	MIT	MIT	AGPL-3.0
Releases (since Nov 2025)	17	29	10	9
Release cadence	~weekly	~twice weekly	~biweekly	~biweekly
Issues opened (90 days)	540	789	40	125
Issues closed (90 days)	531	712	20	111

ty and Pyrefly are shipping fast. Both are on a weekly release cadence or higher with high issue throughput. ty’s issue volume is notable: 789 issues opened in 90 days reflects both heavy adoption and active bug reporting. Pyrefly is closing more issues than it’s opening, a good sign for a beta project.

Response times are quick. In a spot-check of recent issues, ty and Pyrefly both had first responses from core maintainers within minutes to hours. Basedpyright’s maintainer responds quickly too, though at a lower volume. Zuban’s maintainer often replies within an hour.

What we chose

We bundled Pyrefly as Positron’s default Python language server.

The deciding factors:

Pyrefly’s clean design decisions felt like the best fit for Positron. The hover docs are rendered and hyperlinked, with sources for type inference. The type inference catches real bugs without requiring you to annotate everything. While it has the strictest type checking, this is configured to a moderate level by default.
It has active development with strong backing. Meta has committed to making Pyrefly genuinely open-source and community-driven, with biweekly office hours and a public Discord. Development velocity is high.
It is MIT licensed, which allows us to bundle it into Positron.

It wasn’t a runaway winner. Basedpyright is more mature and feature-complete. ty has a lot of long-term potential, especially for ruff users and fans of the gradual guarantee, and is closing feature gaps fast. But for the specific use case of “Python data science in an IDE,” Pyrefly had the best balance of features, UX, and readiness.

How to switch

This space is competitive and moving fast, and you shouldn’t feel locked in. Positron makes it straightforward to switch language servers:

Open the Extensions view (Ctrl-Shift-X (linux), Ctrl-Shift-X (windows), Command-Shift-X (mac)).
Search for and install the language server you want to try (e.g., basedpyright, ty, or zuban).
Disable Pyrefly: search for pyrefly in Extensions, click Disable.
Reload the window with the command Developer: Reload Window.

Or, if you want to keep Pyrefly installed but prevent it from auto-activating, you can use the extensions.allowed setting:

1
2
3
4
5
6


{
    "extensions.allowed": {
        "meta.pyrefly": false,
        "*": true
    }
}

What’s next

We started bundling Pyrefly in November and have been quite pleased with the results. It solved some longstanding user-requested issues (like better semantic highlighting) and feels snappier to users than our previous internal solution.

ty is adding features at an aggressive pace and will likely close its remaining gaps. OpenAI’s acquisition of Astral adds resources but also uncertainty; it’s unclear how it will affect ty’s priorities. Pyrefly continues to improve its type checking and performance (a recent release noted 20% faster PyTorch benchmarks ). Basedpyright tracks upstream Pyright closely and keeps shipping.

Both ty and Pyrefly have been receptive to PRs that improve the experience for Positron users, which suggests they care about working well across editors, not just VS Code. For example, both contribute hover, completions, and semantic highlighting in the Positron Console.

We’ll keep evaluating as these tools mature! Want to try Positron? Download it here .

Another LSP extension is Pylance, which may be familiar to VS Code users, but due to licensing restrictions, Code-OSS forks like Positron cannot use it. ↩︎
Zuban requires a multi-step manual startup, so we couldn’t measure this automatically. ↩︎
Edit (2026-04-01): A previous version of this post undercounted the number of contributors to ty. The updated script to fetch stats lives here . ↩︎

tabpfn 0.1.0

Max Kuhn — Tue, 31 Mar 2026 00:00:00 +0000

We’re stoked to announce the release of tabpfn 0.1.0. TabPFN is a precompiled deep learning Python model for prediction. The R package tabpfn is an interface to this model via reticulate.

You can install it from CRAN with:

1

install.packages("tabpfn")

What is TabPFN?

The “tab” means tabular, which is code for everyday rectangular data structures that we find in csv files and databases.

The “pfn” is more complicated – it stands for “prior fitted network”. The model is trained on fully synthetic datasets. The developers created a complex graph model that can simulate a wide variety of data-generating methods, including correlation structures, distributional skewness, missing-data mechanisms, interactions, latent variables, and more. It can also simulate random supervised relationships linking potential predictors to the outcome data. The training process for the model simulated a very large number of these data sets that, in effect, constitute a “training set data point”. For example, during training, if a batch size of 64 was used, that means 64 randomly generated datasets were used in that iteration.

From these data sets, a complex deep learning model is created that captures a huge number of possible relationships. The model is sophisticated enough and trained in a manner that allows it to effectively emulate Bayesian estimation.

When we use the pre-trained model, our training set matters, even though there is no new estimation. The model includes an attention mechanism that “primes the model” by focusing on the types of relationships in your training data. In that way, the pre-fitted network is deliberately biased to effectively predict our new samples. This leads to in-context learning .

And it works; in fact, it works really well.

License for the Underyling Model

PriorLabs created TabPFN. Version 2.5 of the model, which contained several improvements, requires an API key for accessing the model parameter. Without one, an error occurs:

This model is gated and requires you to accept its terms. Please follow these steps: 1. Visit https://huggingface.co/Prior-Labs/tabpfn_2_5 in your browser and accept the terms of use. 2. Log in to your Hugging Face account via the command line by running: hf auth login (Alternatively, you can set the HF_TOKEN environment variable with a read token).

The license includes provisions for “Non-Commercial Use Only” if you are just trying it out.

Instructions for installing the package and obtaining the API key are in the package’s manual .

Also, the model is most efficient when a GPU is available (by an order of magnitude or two). This may seem obvious to anyone already working with deep learning models, but it is a fairly new requirement for those strictly working with traditional tabular data models.

Usage

The syntax is idiomatic R: it supports fitting interfaces via data frames/vectors, formulas, and recipes. The standard R predict() method is used for prediction. augument() is also available for prediction.

When evaluating pre-trained models, there is a possibility that they may have memorized well-known datasets (e.g., Ames housing, Palmer penguins). TabPFN isn’t trained that way, but just in case we are worried about that, we’ll use lesser-known data. Worley (1987) derived a mechanistic model for the flow rate of liquids from two aquifers positioned vertically (i.e., the “upper” and “lower” aquifers). We’ll generate some of that data and add completely noisy predictors to increase the difficulty. The outcome is very skewed, so we’ll log that too.

Additionally, we’ll load the tidymodels library for simulation, data splitting, and visualization.

1
2
3
4
5
6
7
8
9


library(tabpfn)
library(tidymodels)
library(probably)

set.seed(17)
aquifier_data <-
 sim_regression(2000,  method = "worley_1987") |>
 bind_cols(sim_noise(2000, 50)) |>
 mutate(outcome = log10(outcome))

We’ll use a stratified 3:1 training and testing split:

1
2
3


set.seed(8223)
aquifier_split <- initial_split(aquifier_data, strata = outcome)
aquifier_split

## 
## <1500/500/2000>

1
2


aquifier_train <- training(aquifier_split)
aquifier_test  <- testing(aquifier_split)

and “fit” the model:

1

tab_fit <- tab_pfn(outcome ~ ., data = aquifier_train)

Again, the model does not actually fit anything new. This computes the embeddings for the training set data and stores them for the prediction stage.

To make predictions, predict() returns the model’s results. As previously mentioned, a GPU is not strictly required for these computations. However, if more than a trivial amount of data are being predicted, execution time can be very long.

Since we’ll want to evaluate and plot the data, we’ll use augment(), which just runs predict() and binds the results to the data being predicted:

1

tab_pred <- augment(tab_fit, aquifier_test)

How does it work?

1

tab_pred |> metrics(outcome, .pred)

## # A tibble: 3 × 3
##   .metric .estimator .estimate
##                
## 1 rmse    standard      0.104 
## 2 rsq     standard      0.937 
## 3 mae     standard      0.0829

1

tab_pred |> cal_plot_regression(outcome, .pred)

That looks good, especially with no training.

Next Steps

There is a lot more functionality to add to the package, including additional prediction types and interpretability tools. Many of these are available in extensions .

We’ll also add a new parsnip model type for TabPFN and other integrations with tidymodels in the summer.

Acknowledgements

A huge thanks to Tomasz Kalinowski and Daniel Falbel for their support on this and all of their hard work on reticulate and torch.

Thanks also to the contributors to date: @frankiethull , @mthulin , and @t-kalinowski .

Typst Books, Article Layout, and `typst-gather`

Gordon Woodhull — Tue, 31 Mar 2026 00:00:00 +0000

Typst is a lightning-fast typesetting system that provides a modern alternative to LaTeX.

The Typst ecosystem is thriving, and Quarto 1.9 brings Typst much closer to feature parity with LaTeX:

Typst books
Article layout in Typst
Bundling of Typst packages for offline rendering

Typst books

In Quarto 1.9, a project with type book and format typst is now rendered as a single document with multiple chapters and other book content.

_quarto.yml


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


project:
  type: book

book:
  title: "My Book"
  author: "Jane Doe"
  chapters:
    - index.qmd
    - intro.qmd
    - summary.qmd

format: typst

All book features previously available in the LaTeX format are now available in Typst:

Parts and Chapters
Appendices
Cross-references and chapter-based numbering
Table of Contents

List-of-Figures and List-of-Tables support is coming soon .

The default Typst book uses the bundled Quarto quarto-orange-book extension, which uses typst-gather to bundle the Typst orange-book package. Orange-book provides a textbook-style layout with colored chapter headers and sidebars.

The orange-book extension supports brand.yml customization — it uses the primary color for chapter headers and sidebars, and the medium logo on the title page. The screenshots above were generated with this _brand.yml:

_brand.yml


 1
 2
 3
 4
 5
 6
 7
 8
 9
10


color:
  primary: "#F36619"
  secondary: "#2E86AB"

logo:
  images:
    test-logo:
      path: logo.svg
      alt: "Test Logo"
  medium: test-logo

Since Typst books are implemented as Quarto Format Extensions , you can customize the appearance by creating your own extension. Typst partials define the overall book structure, while Lua filters handle the necessary AST transformations.

Article layout in Typst

Also in Quarto 1.9, all Article Layout features now work in Typst, via the Typst Marginalia package.

Specifically:

Figures, tables, code listings, and equations can be placed in the margin using the .column-margin class or the column: margin code cell option.
You can also target specific output types with fig-column: margin or tbl-column: margin.
Figure, table, and code listing captions can be placed in the margin with cap-location: margin (or fig-cap-location: margin and tbl-cap-location: margin for specific types).
Footnotes and citations can be displayed in the margin with reference-location: margin and citation-location: margin. When margin citations are enabled, the bibliography is suppressed.
Asides (.aside class) place content in the margin without a footnote number.

Books with article layout are functional, but need work

You can combine book and article layout, but there are some layout quirks when combining the two. We’ll work with the orange-book author to integrate Marginalia into the book template.

`typst-gather`

Quarto 1.9 automatically stages Typst packages — from your extensions, from Quarto’s bundled extensions, and from Quarto itself — into the .quarto/ cache directory before calling typst compile. This means Typst documents render offline without needing network access.

To make this work, extension authors use the new typst-gather tool, which scans their .typ files for @preview imports and downloads the packages into the extension directory. Authors run quarto call typst-gather and commit the results. Users of the extension will have the packages staged without any downloads.

This means Custom Typst Formats can depend on Typst packages without copying and pasting Typst code, making them simpler and easier to maintain.

Both Typst books and article layout are built on typst-gather — orange-book depends on the Typst orange-book package, and article layout depends on Marginalia . As the Typst package ecosystem grows, we’re excited to see what the community builds with Typst packages.

Quarto 1.9

Charlotte Wickham — Tue, 24 Mar 2026 00:00:00 +0000

Quarto 1.9 is out! You can get the current release from the download page .

Sharing your work just got easier with integrated Posit Connect Cloud publishing. Typst users will appreciate book project support and article layouts, while experimental PDF accessibility standards bring PDF/A and PDF/UA compliance to both LaTeX and Typst. This release also introduces LLM-friendly output for websites, the quarto use brand command for keeping your brand assets in sync, and list tables for authoring complex tables with familiar bullet syntax.

You can read about these improvements and some other highlights below. You can find all the changes in this version in the Release Notes .

Publish to Posit Connect Cloud

You can now publish documents and websites to Posit Connect Cloud directly from the command line. For example, publish your Quarto website project with:

Terminal

1

quarto publish posit-connect-cloud

Posit Connect Cloud is a hosted platform for sharing data applications and documents without managing your own infrastructure. It includes a free tier for unlimited static document publishing. Read more in Publishing > Posit Connect Cloud .

Improvements to Typst Support

Quarto 1.9 brings substantial improvements to Typst output:

Book projects can now render to Typst via the bundled orange-book extension, with chapter numbering, cross-references, and professional textbook styling.
Article layout support lets you place content in the margins, create full-width figures, or add side notes.
New options: mathfont, codefont, linestretch, linkcolor, citecolor, filecolor, thanks, and abstract-title.
Theorem styling with four appearance options: simple, fancy, clouds, or rainbow.

See this blog post for details on all the Typst improvements.

PDF Accessibility (Experimental)

We’re rolling out experimental support for PDF accessibility standards in 1.9. The new pdf-standard option enables PDF/A archival formats and PDF/UA accessibility compliance for both LaTeX and Typst outputs. Alt text from fig-alt attributes now passes through to PDF for screen reader support, and Typst gains support for alt text on cross-referenced equations.

Read more in our PDF Accessibility and Standards blog post or the documentation for LaTeX and Typst .

Output for LLMs

Quarto can now generate llms.txt format output for your website, making your content more accessible to large language models and AI-powered tools.

Enable it in your website configuration:

_quarto.yml


1
2
3


website:
  title: "My Documentation"
  llms-txt: true

When you render your site, Quarto creates:

An llms.txt index file at the root of your site listing all pages
A .llms.md markdown file alongside each HTML page (e.g., guide.html gets guide.llms.md)

The markdown files contain clean versions of your content—navigation, sidebars, and scripts are stripped out; tables, code blocks, and callouts are converted to standard markdown.

Read more, including how to customize what appears in LLM output, in Websites > Output for LLMs .

`quarto use brand` Command

Keep your project’s brand assets in sync with an external source using the new quarto use brand command:

Terminal

1

quarto use brand myorg/shared-brand

The command copies brand files from a GitHub repository, local directory, or zip archive into your project’s _brand/ directory. Quarto walks you through each step—confirming trust for remote sources, creating the directory if needed, and asking whether to overwrite or remove files.

See Guide > Brand for --dry-run, --force, and other options.

List Tables

List tables provide a new syntax for creating tables with complex content—multiple paragraphs, code blocks, or nested lists—using familiar bullet syntax instead of grid table formatting:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


::: {.list-table}
- - Function
  - Description

- - `sum()`
  - Add values:

    ```python
    sum([1, 2, 3])
    ```

- - `len()`
  - Count items:

    - Works on lists
    - Works on strings
:::

Function Description

Function	Description
`sum()`	Add values: `sum([1, 2, 3])`
`len()`	Count items: Works on lists Works on strings

sum()

Add values:

sum([1, 2, 3])

len()

Count items:

Works on lists
Works on strings

Each top-level bullet represents a row; nested bullets represent cells. This syntax is much easier to maintain than grid tables, especially when cells contain code or other block elements.

List tables support all the usual table features: captions, cross-references, column widths, and alignment. Thanks to Martin Fischer for the original development, with contributions from Albert Krewinkel and William Lupton.

Find all the details in Guide > Tables .

Other Highlights

Search Result Highlighting : Improved highlighting of search terms on destination pages, with persistent marks, automatic tab activation for matches inside tabsets, and cross-element highlighting for multi-word searches.
Privacy-focused features for websites:
- A privacy-first default for cookie consent : The default for cookie consent has changed to type: express, providing opt-in consent that blocks cookies until users explicitly agree. This privacy-conscious default is designed with modern privacy regulations in mind.
- Algolia Search Insights avoids cookies : Use Algolia Insights now uses persistent cookies only if cookie-consent is active, and the user has opted-in.
- Use Plausible Analytics : Add privacy-friendly Plausible Analytics to websites via the plausible-analytics configuration option.
aria-label for videos : Improve accessibility of embedded videos by providing custom descriptive labels for screen readers instead of the default “Video Player” label.
New syntax-highlighting Option : Replaces the deprecated highlight-style (Pandoc 3.8). Supports style names, custom .theme files, none, or idiomatic for format-native highlighting.
Metadata and brand extensions now work without a _quarto.yml project. A temporary default project is created in memory.
Engine extensions allow replacement of the execution engine:
- Julia is now a bundled extension instead of being built-in.
- quarto-marimo will soon change from a filter extension to an engine extension.
- New quarto create extension engine command.
- New quarto call build-ts-extension command.
- New Quarto API for engine extensions to use. (This is in flux and will not be documented for the next few releases, but there is a dev blog post about it .)

Dependency updates:

pandoc updated to 3.8.3
typst updated to 0.14.2
esbuild updated to 0.25.10
deno updated to 2.4.5
mermaid updated to 11.12.0

Acknowledgements

One of the early proposals for PDF accessibility and alt text in the LaTeX ecosystem was provided to us by Sam Schiano and Sophie Breitbart . We want to thank them for bringing into our attention the approach they used in their {asar} R package , which influenced some of our design.

In addition, we’d like to say a huge thank you to everyone who contributed to this release by opening issues and pull requests:

CoryMcCartan , DanChaltiel , Data-Wise , FrankwaP , Joao-O-Santos , LukasDSauer , MBe-iUS , MarcoPortmann , MariaBarrioSchez , MateusMolina , Selbosh , ThePurox , TucoFernandes , aecoleman , amirhome61 , andrewheiss , azankl , bensoltoff , bruvellu , byzheng , cbrnr , chendaniely , chi-raag , christopherkenny , coatless , cynthiahqy , darwindarak , davidskalinder , dmenne , fconil , fkgruber , fkohrt , fredguth , gadenbuie , github-actions[bot] , gsathler-vi , hamgamb , herosi , icarusz , idavydov , jeremy886 , jkrumbiegel , jmcphers , jonas37 , jorherre , jreades , jromanowska , jtbayly , juleswg23 , juliasilge , kathsherratt , kusnezoff-alexander , lrrichter , lwjohnst86 , maelle , matthiasbaitsch , mipmip , mstrms2000 , multimeric , mvuorre , mykolaskrynnyk , nichtich , nithinmkp , nrichers , orbsmiv , paytonej , petrelharp , phongphuhanam , pm-gusmano , posit-snyk-bot , prosoitos , rabyj , sasja-san , sbwiecko , serialc , spaette , spraetor , stragu , szimmer , the-solipsist , thomasp85 , yyzeng , zhe00a .

The airplane departure emoji in the listing and social card image for this post comes from OpenMoji– the open-source emoji and icon project. License: CC BY-SA 4.0

2026 Posit Internships

Max Kuhn — Fri, 20 Mar 2026 00:00:00 +0000

We are once again chuffed to offer summer internships.

Our internship program has been a great success over the years. If you want to know what it is like, many of our alumni have written about it:

2016: Thomas Lin Pederson
2017: Lucy D’Agostino McGowan and Kara Woo
2018: Alex Hayes , Fanny Chow , Irene Steves , and Dana Paige Seidel
2019: Marly Gotti and Dewey Dunnington
2020: Simon Couch
2022: Mike Mahoney
2025: Frances Lin .

Three past interns are current Posit employees: Thomas Lin Pederson, Kara Woo, and Simon Couch.

2026 Positions

This year, we have four positions in different groups. The positions are US-based and range from 10-12 weeks, starting on May 26, 2026. See the link at the bottom for the details.

Skills and Evals Intern (PyData Team)

The PyData team is looking for an intern to help make AI agents better at using our Python open-source projects by writing skills and evaluations for common user tasks.

The core of the role is to identify the tasks users perform with our tools — such as Plotnine and Great Tables — translate them into clear skill definitions that agents can use, and build evaluations that measure whether agents can reliably complete those tasks. This includes writing prompts, creating example workflows, and developing automated tests that measure how well agents perform. A major focus will be on applying the emerging skills format, while the broader goal is to improve documentation, examples, and API design across the PyData ecosystem in ways that make our tools work better with AI-assisted workflows.

R Modeling Intern (Tidymodels Team)

The tidymodels R internship is focused on different tasks, including: expand content on tidymodels.org , expanding tabular deep learning models (in brulee ), additional performance metrics for survival analysis models , modernizing the caret package , and/or Rust bindings for predictive models. The intern is welcome to suggest R-based projects focused on modeling and/or data analysis.

Shiny Accessibility and Testing Intern (Shiny Team)

The Shiny team is looking for an intern to help advance accessibility and testing across the Shiny framework. You’ll audit Shiny components against Web Content Accessibility Guidelines (WCAG), implement fixes, improve test coverage, and contribute to documentation that helps the broader community build accessible Shiny apps.

Some of the harder problems in this role aren’t strictly code problems. Shiny’s components are built for flexible, abstract usage, so you can’t always anticipate how they’ll end up on a page. Making them accessible means understanding HTML semantics and WCAG well enough to exercise good judgment and make sensible compromises when there isn’t one clear right answer. Candidates should be comfortable with Git and GitHub, have solid working knowledge of HTML/CSS, and have experience in at least one of R, Python, or JavaScript. Familiarity with WCAG, assistive technology, automated testing frameworks, or open-source workflows is a plus.

Software Engineering Intern (Posit Connect Team)

The Posit Connect team is looking for an intern to contribute to the development and quality of Connect, Posit’s professional platform for publishing and sharing data science and AI applications at scale. The primary focus of the internship will be to contribute reports and applications to the Connect Gallery , an open-source collection of useful extensions and example content. These Python, R, and Quarto projects help data science teams realize the full potential of the product and allow us to experiment with new features. In the process of building these apps, you will have the opportunity to contribute to the Connect product as well.

Applying

To apply, make sure that you have a GitHub handle and follow this link:

https://posit.co/job-detail/?gh_jid=7674250003

We can’t wait to get started and look forward to reading your applications.

Native Jupyter Notebook Support Has Arrived in Positron

Cindy Tong — Mon, 16 Mar 2026 00:00:00 +0000

Positron now ships with a native Jupyter Notebook Editor , a new unified experience we built from the ground up for working with Jupyter notebooks within Positron.

Why we built our own notebook editor

We built the Positron Notebook Editor to treat your .ipynb files as first-class citizens in an IDE tailored specifically for data science workflows.

Up to this point, Positron used the legacy Code OSS notebook editor that powers VS Code. While functional, this editor was designed for general-purpose development and not specifically for data science workflows. The tradeoffs show up in small ways that compound over time: limited context for AI assistance, no deep integration with your variables or data, and a user experience that treats .ipynb files as just another file type.

We wanted notebooks to feel like a first-class part of a data science IDE, so we built our own native notebook editor.

If you missed the original February announcement , that post covers our initial reasoning in more detail.

What’s included out of the box

The Positron Notebook Editor brings the core capabilities of Positron directly into your notebook workflow:

Variables Pane : Variables update in real time as you run cells. No need to print or inspect manually.

Data Explorer : When a cell returns a Pandas or Polars DataFrame, you get an inline data viewer. Open the full Data Explorer to sort, filter, and profile your data. Any filtering or cleaning you do can be converted into code, so your analysis stays reproducible without writing repetitive df.head() or df.describe() calls.

AI Assistant : The Assistant has access to your notebook’s full context, including cell states, execution history, and outputs like images and tables. It can suggest edits, reorder cells, and run code with your permission. You can inspect exactly what context it’s using and follow along as it works.

Help Pane : Python and R documentation is available inline, with hyperlinks, without switching to a browser.

Publisher : Deploy your .ipynb notebooks directly to Connect or Connect Cloud, where you can manage access, schedule runs, and view telemetry.

A sample notebook workflow

Now that you have all these capabilities in one place, your workflow might look something like this:

Import your data using Pandas or Polars.
Run your notebook cells and watch variables update in the pane as cells run.
Explore your DataFrame in the inline Data Explorer. Sort and filter without writing any code.
Use Assistant to generate a visualization based on your filtered data or AI quick actions to recommend next steps.
When the analysis is ready to share, use an AI action to add markdown headers and notes.
Publish the notebook to Connect or Connect Cloud to share with your colleagues.

What’s coming next

The roadmap includes SQL support, improved version control, R improvements, and more. You can view and vote on items in the GitHub roadmap .

Get started with the alpha

Download Positron and install a release from February 2026 or later.
Enable the alpha by setting positron.notebook.enabled to true in your settings.
Try the tutorial repository for examples that use the new features.
Share feedback in GitHub Discussions or book time to talk with us directly .

We’re excited to hear how you use the Positron Notebook Editor as we continuously improve the experience.

orbital 0.5.0

Emil Hvitfeldt — Fri, 13 Mar 2026 00:00:00 +0000

We’re over the moon to announce the release of orbital 0.5.0. orbital lets you predict in databases using tidymodels workflows. orbital uses tidypredict under the hood to translate fitted models into expressions. This post will also cover things from tidypredict’s 1.1.0 release.

This blogpost is about the R orbital package, but there’s also a python version that works on scikit-learn models.

You can install both from CRAN with:

install.packages(c("orbital", "tidypredict"))

This blog post will cover the highlights, which are support for more models, faster performance, and more vignettes.

You can see a full list of changes in the orbital release notes and tidypredict release notes .

Newly supported models

We have added support for new models as well as more prediction types for existing supported models.

The newly supported models are.

decision_tree(engine = "rpart")
boost_tree(engine = "lightgbm")
boost_tree(engine = "catboost") (More on this soon)

All of which support regression, classification and probability estimates.

The following models now also support classification and probability estimates in addition to regression.

mars(engine = "earth")
multinom_reg(engine = "glmnet")
rand_forest(engine = "randomForest")
rand_forest(engine = "ranger")

If there is a model type you specifically need please let us know so we can prioritize new additions.

Nested `case_when()` support

All tree based models were previously implemented as a flat case_when() statement. This means that a small tree with 3 leaves would look like this.

1
2
3
4
5


case_when(
  x <= 5 & y <= 3 ~ "low",
  x <= 5 & y > 3  ~ "med",
  x > 5           ~ "high"
)

And while this works, it comes with a number of downsides. In this example we have to calculate x <= 5 more than once. This might not be that big of a deal in this sized tree but it compounds very fast as the tree grows deeper.

We are also not using the information effectively. Since trees are exhaustive we shouldn’t have to calculate the last condition as all other choices have been ruled out. With these considerations in mind we have switched all trees to be expressed as nested case_when() statements.

1
2
3
4
5
6
7


case_when(
  x <= 5 ~ case_when(
    y <= 3 ~ "low",
    .default = "med"
  ),
  .default = "high"
)

This case_when() evaluates exactly the same as the previous flat case_when() statement. While this might be a little harder to read it provides a lot of benefit in terms of performance. Each condition is evaluated at most 1 time. This has a really big influence on the computational speed.

This also means that the R version of orbital now matches what the python version of orbital does when creating a tree.

New `separate_trees` argument

Some models like the ensemble tree models can be represented as a combination of multiple smaller models.This typically manifests as a single massive expression in the following format:

1

.pred = "(tree1) + (tree2) + (tree3) + ... + (tree100)"

This can create trouble for two main reasons. The first one is that this can cause us to hit expression nesting depth when trying to execute these in a database if we have too many trees or have too deep trees. The second related issue is that databases will not be able to recognize that these trees could be calculated in parallel and combined afterwards.

This is where the new separate_trees argument comes in. When setting separate_trees = TRUE in orbital() you change the internal representation of the orbital object to not have a single massive expression for .pred and instead split them out into multiple expressions like so.

1
2
3
4
5


.pred_tree_001 = "case_when(...)"
.pred_tree_002 = "case_when(...)"
.pred_tree_003 = "case_when(...)"
...
.pred = ".pred_tree_001 + .pred_tree_002 + .pred_tree_003 + ..."

This representation allows the database query optimizer to potentially evaluate trees in parallel, since each intermediate column is independent.

The separate_trees argument works for the following engines.

xgboost
lightgbm
catboost
ranger
randomForest

This change alone allows us to work with model types previously not possible with orbital. Together with the nested tree support you can now productionize some of the most popular machine learning models.

splines support

Spline transformations are commonly used in preprocessing to capture non-linear relationships between predictors and the outcome.

With this release, orbital now supports step_spline_b(), step_spline_convex(), step_spline_monotone(), step_spline_natural(), and step_spline_nonnegative() from the recipes package. Under the hood, splines are translated into piecewise polynomial expressions that can be evaluated directly in SQL.

More vignettes

We have added a handful of new vignettes as well in this release.

SQL expression sizes : Goes over how different hyperparameters in models affect SQL sizes. This is useful information especially when working with boosted trees as there are many different combinations of hyperparameters that produce similar performance at different SQL expression sizes. With a little effort you could pick a model that runs 10-100 times faster with minimal loss in predictive performance.
Parallel tree evaluation in databases : A more in-depth look at how the separate_trees argument works. Also includes a section on why and when you should consider using it.
Database deployment : Shows examples of how we can deploy an orbital model using tables and views.
Float precision at split boundaries : Some models like xgboost and Cubist models operate on 32-bit doubles instead of on 64-bit doubles like we have in R. This can cause some problems where predictions don’t match exactly. If you use any of these models you should read this vignette to see if this issue is a dealbreaker for you or not.

Acknowledgements

A special thanks to Emily Riederer who helped workshop and benchmark these new features.

Outgrowing your laptop with R and Positron

Julia Silge — Thu, 05 Mar 2026 00:00:00 +0000

My data is too big for my laptop!

Last week, I had the pleasure of giving a talk to R-Ladies Abuja about how Positron can grow with you as you work on data that is too large for your laptop. The talk was recorded, and you can find it on YouTube here:

I opened this talk discussing how I first learned about these “beyond your laptop” technologies, typically in a organization where these technologies were already in use and seemed specific to infrastructure there. I later came to understand that these technologies are actually related to each other and understanding one can really help when you need to pick up another one. I pointed out some of the Positron features that are designed to make it easier to work with these technologies:

Check out my slides

If you’d like to check out my slides, they are available as well :

PDF Accessibility and Standards

Gordon Woodhull — Thu, 05 Mar 2026 00:00:00 +0000

Pre-release Feature

This feature is new in the upcoming Quarto 1.9 release. To use the feature now, you’ll need to download and install the Quarto pre-release.

2025 was a big year for PDF accessibility. LaTeX and Typst both released support for PDF tagging and accessibility standards, just in time for new regulations in the EU (June 2025) and US (April 2026).

Quarto 1.9 brings this support to you as a Quarto user.

What PDF Standards Do

Currently LaTeX supports the newer UA-2 standard, and Typst supports the older UA-1 standard. Typst is likely to have UA-2 support later in 2026.

Both standards instruct the PDF renderer to provide screen readers:

The semantic structure of the text (title, heading, paragraph, figure, etc)
The natural reading order
Spatial coordinates for highlighting and assistive navigation
Required metadata such as title and language

How to enable a PDF Standard in Quarto

In Quarto 1.9, specify a PDF standard for your document or project with pdf-standard

PDF (LaTeX)

1
2
3


format:
  pdf:
    pdf-standard: ua-2

Typst

1
2
3


format:
  typst:
    pdf-standard: ua-1

pdf-standard takes a single standard name or list of standard names. PDF version is used if provided in the list, but otherwise inferred from the standard.

If you specify a PDF standard, Quarto first instructs LaTeX or Typst to use the standard when producing the PDF, and then validates the output PDF against the standard using veraPDF, an open-source PDF validation tool. If veraPDF is not installed, you’ll get a warning but still receive a PDF – it just won’t be validated.

Installing veraPDF

To install veraPDF, you’ll first need Java, then run:

Terminal

1

quarto install verapdf

When a document passes validation, you’ll see output like:

[verapdf]: Validating my-document.pdf against PDF/UA-2... PASSED

Creating accessible PDFs

Quarto’s Markdown-based workflow handles many accessibility requirements automatically:

Document metadata (title, author, date, language) flows into the PDF’s built-in metadata fields.
The semantic structure of Markdown satisfies PDF tagging requirements. For Typst this is always enabled; for LaTeX it is enabled when you specify a standard that requires it.
Alt text for images is carried through to the PDF for screen readers.

But you do need to make sure your document has:

A title in the YAML front matter.
Alt text for every image, specified with fig-alt. See Figures for details.

See the LaTeX and Typst documentation for more details.

If your document fails validation

LaTeX does not perform validation during PDF generation, so if veraPDF validation fails, that’s a warning, and you still get a partially-accessible PDF as long as you use pdf-standard: ua-2.

Typst fails and does not produce a PDF if its built-in validation fails during PDF generation. However, in Typst all accessibility features are on by default, so you can generate a partially-accessible PDF by rendering without pdf-standard.

Current limitations

We ran our test suite – 188 LaTeX examples and 317 Typst examples – to find where Quarto PDFs do not yet pass UA-1 or UA-2, and where users will need to change their documents.

LaTeX

Margin content is the biggest structural blocker. If you use .column-margin divs, cap-location: margin, reference-location: margin, or citation-location: margin, the resulting PDF will not pass UA-2. The underlying sidenotes and marginnote LaTeX packages do not cooperate with PDF tagging .

(Margin content does work with Typst and passes UA-1 – see Typst Article Layout .)

There are smaller upstream issues in Pandoc, LaTeX, and LaTeX packages, documented here .

Typst

In our tests, Typst catches every UA-1 violation, and fails to generate the PDF. veraPDF did not detect any violation that Typst did not.

Typst also seems to do a very good job of generating UA-1 compliant output by default – almost all errors were due to missing titles or missing alt text.

However, we did discover that Typst books are not yet compliant. There is a structural problem with the Typst orange-book package and we’ll work with the maintainers to correct it.

Conclusion

Although Typst currently targets an the earlier UA-1 standard, today it seems to offer better PDF accessibility than LaTeX.

We expect PDF accessibility support to improve through the LaTeX ecosystem throughout 2026 as awareness of UA-2 and the new regulations spreads.

If you run into accessibility issues with PDF output, please search the Quarto discussions and open a new one with the accessibility label for any issues you discover.

Rapp 0.3.0

Tomasz Kalinowski — Wed, 18 Feb 2026 00:00:00 +0000

We’re excited to share our first tidyverse blog post for Rapp, alongside the 0.3.0 release. Rapp helps you turn R scripts into polished command-line tools, with argument parsing and help generation built in.

Why a command-line interface for R?

A command-line interface (CLI) lets you run programs from a terminal, without opening an IDE or starting an interactive R session. This is useful when you want to:

automate tasks via cron jobs, scheduled tasks, or CI/CD pipelines
chain R scripts together with other tools in data pipelines
let others run your R code without needing to know R
package reusable tools that feel native to the terminal
expose specific actions through a clean interface that LLM agents can invoke

There are several established packages for building CLIs in R, including argparse, optparse, and docopt, where you explicitly parse and handle command-line arguments in code. Rapp takes a different approach: it derives the CLI surface from the structure of your R script and injects values at runtime, so you never need to handle CLI arguments manually.

How Rapp works

At its core, Rapp is an alternative front-end to R: a drop-in replacement for Rscript that automatically turns common R expression patterns into command-line options, switches, positional arguments, and subcommands. You write normal R code and Rapp handles the CLI surface.

Rapp also uses special #| comments (similar to Quarto’s YAML-in-comments syntax) to add metadata such as help descriptions and short aliases.

A tiny example

Here’s a complete Rapp script (from the package examples), a coin flipper:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


#!/usr/bin/env Rapp
#| name: flip-coin
#| description: |
#|   Flip a coin.

#| description: Number of coin flips
#| short: 'n'
flips <- 1L

sep <- " "
wrap <- TRUE

seed <- NA_integer_
if (!is.na(seed)) {
  set.seed(seed)
}

cat(sample(c("heads", "tails"), flips, TRUE), sep = sep, fill = wrap)

Let’s break down how Rapp interprets this script:

R code	Generated CLI option	What it does
`flips <- 1L`	`--flips` or `-n`	Integer option with default of 1
`sep <- " "`	`--sep`	String option with default of `" "`
`wrap <- TRUE`	`--wrap` / `--no-wrap`	Boolean toggle (TRUE/FALSE becomes on/off)
`seed <- NA_integer_`	`--seed`	Optional integer (NA means “not set”)

The #| short: 'n' comment adds -n as a short alias for --flips. The #!/usr/bin/env Rapp line (called a “shebang”) lets you run the script directly on macOS and Linux without typing Rapp first.

Running the script

With Rapp installed and flip-coin available on your PATH (see Get started below), you can run the app from the terminal:

1
2
3
4
5


flip-coin -n 3
#> heads tails heads

flip-coin --seed 42 -n 5
#> tails heads tails tails heads

Auto-generated help

Rapp generates --help from your script (and --help-yaml if you want a machine-readable spec):

1

flip-coin --help

1
2
3
4
5
6
7
8
9


Usage: flip-coin [OPTIONS]

Flip a coin.

Options:
  -n, --flips   Number of coin flips [default: 1] [type: integer]
  --sep           [default: " "] [type: string]
  --wrap / --no-wrap   [default: true] Disable with `--no-wrap`.
  --seed         [default: NA] [type: integer]

Breaking change in 0.3.0: positional arguments are now required by default

If you’re upgrading from an earlier version of Rapp, note that positional arguments are now required unless explicitly marked optional.

1
2
3
4
5
6


# Before 0.3.0: this positional was optional
name <- NULL

# In 0.3.0+: add this comment to keep it optional
#| required: false
name <- NULL

If your scripts use positional arguments with NULL defaults that should remain optional, add #| required: false above them.

Highlights in 0.3.0

Rapp will be new to most readers, so rather than listing every change, here are the main ideas (and what’s improved in 0.3.0).

Options, switches, and repeatable flags from plain R

Rapp recognizes a small set of “declarative” patterns at the top level of your script:

Scalar literals like flips <- 1L become options like --flips 10.
Logical defaults like wrap <- TRUE become toggles like --wrap / --no-wrap.
#| short: n adds a short alias like -n (new in 0.3.0).
c() and list() defaults declare repeatable options (new in 0.3.0): callers can supply the same flag multiple times and values are appended.

Subcommands with `switch()`

Rapp can now turn a switch() block into subcommands (and you can nest switch() blocks for nested commands). Here’s a small sketch of a todo-style app:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


#!/usr/bin/env Rapp
#| name: todo
#| description: Manage a simple todo list.

#| description: Path to the todo list file.
#| short: s
store <- ".todo.yml"

switch(
  command <- "",

  #| description: Display the todos
  list = {
    limit <- 30L
    # ...
  },

  #| description: Add a new todo
  add = {
    task <- NULL
    # ...
  }
)

Help is scoped to the command you’re asking about, so todo --help lists the commands, and todo list --help shows just the options/arguments for list (plus any parent/global options).

Installable launchers for package CLIs

A big part of sharing CLI tools is making them easy to run after installation. In 0.3.0, install_pkg_cli_apps() installs lightweight launchers for scripts in a package’s exec/ directory that use either #!/usr/bin/env Rapp or #!/usr/bin/env Rscript:

1

Rapp::install_pkg_cli_apps("mypackage")

(There’s also uninstall_pkg_cli_apps() to remove a package’s launchers.)

Get started

Here’s the quickest path to your first Rapp script:

1
2


# 1. Install the package
install.packages("Rapp")

1
2


# 2. Install the command-line launcher
Rapp::install_pkg_cli_apps("Rapp")

Then create a script (e.g., hello.R):

1
2
3
4
5
6


#!/usr/bin/env Rapp
#| name: hello
#| description: Say hello

name <- "world"
cat("Hello,", name, "\n")

And run it:

1
2


Rapp hello.R --name "R users"
#> Hello, R users

Learn more

To dig deeper into Rapp:

browse examples in the package: system.file("examples", package = "Rapp")
read the full documentation: https://github.com/r-lib/Rapp
note that Rapp requires R ≥ 4.1.0

If you try Rapp, we’d love feedback! We especially want to hear about your experiences with edge cases in argument parsing, help output, and how commands should feel. Issues and ideas are welcome at https://github.com/r-lib/Rapp/issues .

mirai 2.6.0

Charlie Gao — Thu, 12 Feb 2026 00:00:00 +0000

mirai 2.6.0 is now on CRAN. mirai is R’s framework for parallel and asynchronous computing. If you’re fitting models, running simulations, or building Shiny apps, mirai lets you spread that work across multiple processes – locally or on remote infrastructure.

With this release, it bridges the gap between your laptop and enterprise infrastructure – the same code you prototype locally now deploys to Posit Workbench or any cloud HTTP API, with a single function call.

You can install it from CRAN with:

1

install.packages("mirai")

The flagship feature for this release is the HTTP launcher for deploying daemons to cloud and enterprise platforms. This release also brings a C-level dispatcher for minimal task dispatch overhead, race_mirai() for process-as-completed patterns, synchronous mode for debugging, and daemon synchronization for remote deployments. You can see a full list of changes in the release notes .

How mirai works

If you’ve ever waited for a loop to finish fitting models, processing files, or calling APIs, mirai can help. Any task that’s repeated independently across items is a candidate for parallel execution.

The previous release post covered mirai’s design philosophy in detail. Here’s a brief overview for readers encountering mirai for the first time.

library(mirai)
# Set up 4 background processes
daemons(4)

# Send work -- non-blocking, returns immediately
m <- mirai({
  Sys.sleep(1)
  100 + 42
})
m
#> < mirai [] >

# Collect the result when ready
m[]
#> [1] 142

# Shut down
daemons(0)

That’s mirai in a nutshell: daemons() to set up workers, mirai() to send work, [] to collect results. Everything else builds on this.

In mirai’s hub architecture, the host session listens at a URL and daemons – background R processes that do the actual work – connect to it. You send tasks with mirai() , and the dispatcher routes them to available daemons in first-in, first-out (FIFO) order.

This design enables dynamic scaling: daemons can connect and disconnect at any time without disrupting the host. Add capacity when you need it, release it when you don’t.

A single compute profile can mix daemons launched by different methods, and you can run multiple profiles simultaneously to direct different tasks to different resources. The basic syntax for each deployment method:

Deploy to	Setup
Local	`daemons(4)`
Remote (SSH)	`daemons(url = host_url(), remote = ssh_config(...))`
HPC cluster (Slurm, SGE, PBS, LSF)	`daemons(url = host_url(), remote = cluster_config())`
HTTP API / Posit Workbench	`daemons(url = host_url(), remote = http_config())`

Change one line and your local prototype runs on a Slurm cluster. Change it again and it runs on Posit Workbench. Your analysis code stays identical.

The async foundation for the modern R stack

mirai has become the convergence point for asynchronous and parallel computing across the R ecosystem.

It is the recommended async backend for Shiny – if you’re building production Shiny apps, you should be using mirai. It is the only async backend for the next-generation plumber2 – if you’re building APIs with plumber2, you’re already using mirai.

It is the parallel backend for purrr – if you use map(), mirai is how you make it parallel. Wrap your function in in_parallel() , set up daemons, and your map calls run across all of them:

1
2
3
4
5


library(purrr)
daemons(4)
models <- split(mtcars, mtcars$cyl) |>
  map(in_parallel(\(x) lm(mpg ~ wt + hp, data = x)))
daemons(0)

It powers targets – the pipeline orchestration tool for reproducible analysis. And most recently, ragnar – the Tidyverse package for retrieval-augmented generation (RAG) – adopted mirai for its parallel processing.

As an official alternative communications backend for R’s parallel package, mirai underpins workflows from interactive web applications to pipeline orchestration to AI-powered document processing.

Learn mirai, and you’ve learned the async primitive that powers the modern R stack. The same two concepts – daemons() to set up workers, mirai() to send work – are all you need to keep a Shiny app responsive or run async tasks in production.

HTTP launcher

This release extends the “deploy everywhere” principle with http_config() , a new remote launch configuration that deploys daemons via HTTP API calls – any platform with an HTTP API for launching jobs.

Posit Workbench

Many organizations use Posit Workbench to run research and data science at scale. mirai now integrates directly with it.¹ Call http_config() with no arguments and it auto-configures using the Workbench environment:

1

daemons(n = 4, url = host_url(), remote = http_config())

That’s it. Four daemons launch as Workbench jobs, connect back to your session, and you can start sending work to them.

Here’s what that looks like in practice: you’re developing a model in your Workbench session. Fitting it locally is slow. Add that line, and those fits fan out across four Workbench-managed compute jobs. When you’re done, daemons(0) releases them. No YAML, no job scripts, no leaving your R session – resource allocation, access control, and job lifecycle are all handled by the platform.

If you’ve been bitten by expired tokens in long-running sessions, http_config() is designed to prevent that. Under the hood, it stores functions rather than static values for credentials and endpoint URLs. These functions are called at the moment daemons actually launch, so session cookies and API tokens are always fresh – even if you created the configuration hours earlier.

See the mirai vignette for troubleshooting remote launches.

Custom APIs

The HTTP launcher works with any HTTP API, not just Workbench. Supply your own endpoint, authentication, and request body:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


daemons(
  n = 2,
  url = host_url(),
  remote = http_config(
    url = "https://api.example.com/launch",
    method = "POST",
    token = function() Sys.getenv("MY_API_KEY"),
    data = '{"command": "%s"}'
  )
)

The "%s" placeholder in data is where mirai inserts the daemon launch command at launch time. Each argument can be a plain value or a function – use functions for anything that changes between launches (tokens, cookies, dynamic URLs).

This opens up a wide range of deployment targets: Kubernetes job APIs, other cloud container services, or any internal job scheduler with an HTTP interface. If you can launch a process with an HTTP call, mirai can use it.

C-level dispatcher

The overhead of distributing your tasks is now negligible. In a mirai_map() over thousands of items, what you measure is the time of your actual computation, not the framework – per-task dispatch overhead is now in the tens of microseconds, where existing R parallelism solutions typically operate in the millisecond range.

Under the hood, the dispatcher – the process that sits between your session and the daemons, routing tasks to available workers – has been re-implemented entirely in C code within nanonext . This eliminates the R interpreter overhead that remained, while the dispatcher continues to be event-driven and consume zero CPU when idle.

This also removes the bottleneck when coordinating large numbers of daemons, which matters directly for the kind of scaled-out deployments that the HTTP launcher enables – dozens of Workbench jobs or cloud instances all connecting to a single dispatcher. The two features are designed to work together: deploy broadly, dispatch efficiently. mirai is built to scale from 2 cores on your laptop to 200 across a cluster, without the framework slowing you down.

`race_mirai()`

race_mirai() lets you process results as they arrive, rather than waiting for the slowest task. Suppose you’re fitting 10 models with different hyperparameters in parallel – some converge quickly, others take much longer. Without race_mirai() , you wait for the slowest fit to complete before seeing any results. With it, you can inspect or save each model the instant it finishes – updating a progress display, freeing memory, or deciding whether to continue the remaining fits at all.

race_mirai() returns the integer index of the first resolved mirai. This makes the “process as completed” pattern clean and efficient:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


daemons(4)

# Launch 10 model fits in parallel
fits <- lapply(param_grid, function(p) mirai(fit_model(data, p), data = data, p = p))

# Process each result as soon as it's ready
remaining <- fits
while (length(remaining) > 0) {
  idx <- race_mirai(remaining)
  cat("Finished model with params:", remaining[[idx]]$data$p, "\n")
  remaining <- remaining[-idx]
}

daemons(0)

Send off a batch of tasks, then process results in the order they finish – no polling, no wasted time waiting on the slowest one. If any mirai is already resolved when you call race_mirai() , it returns immediately. This pattern applies whenever tasks have variable completion times – parallel model fits, API calls, simulations, or any batch where you want to stream results as they land.

Synchronous mode

When tasks don’t behave as expected, you need a way to inspect them interactively.

Without synchronous mode, errors in a mirai return as miraiError objects – you can see that something went wrong, but you can’t step through the code to find out why. The task ran in a separate process, and by the time you see the error, that process has moved on.

daemons(sync = TRUE), introduced in 2.5.1, solves this. It runs everything in the current process – no background processes, no networking – just sequential execution. You can use browser() and other interactive debugging tools directly:

1
2
3
4
5
6
7
8


daemons(sync = TRUE)
mirai(
  {
    browser()
    mypkg::some_complex_function(x)
  },
  x = my_data
)

You can scope synchronous mode to a specific compute profile, isolating the problematic task for inspection while the rest of your pipeline keeps running in parallel.

Daemon synchronization with `everywhere()`

everywhere() runs setup operations on all daemons – loading packages, sourcing scripts, or preparing datasets – so they’re ready before you send work.

When launching remote daemons – via SSH, HPC schedulers, or the new HTTP launcher – there’s an inherent delay between requesting a daemon and that daemon being ready to accept work. The new .min argument ensures that setup has completed on at least that many daemons before returning:

1
2
3
4
5
6
7


daemons(n = 8, url = host_url(), remote = http_config())

# Wait until all 8 daemons are connected before continuing
everywhere(library(mypackage), .min = 8)

# Now send work once all daemons are ready
mp <- mirai_map(tasks, process)

This creates a synchronization point, ensuring your pipeline doesn’t start sending work before all daemons are ready. It’s especially useful for remote deployments where connection times are unpredictable.

Minor improvements and fixes

miraiError objects now have conditionCall() and conditionMessage() methods, making them easier to use with R’s standard condition handling.
The default exit behavior for daemons has been updated with a 200ms grace period before forceful termination, which allows OpenTelemetry disconnection events to be traced.
OpenTelemetry span names and attributes have been revised to better follow semantic conventions.
daemons() now properly validates that url is a character value where supplied.
Fixed a bug where repeated mirai cancellation could sometimes cause a daemon to exit prematurely.

Try it now

1
2
3
4
5
6
7
8


install.packages("mirai")
library(mirai)

daemons(4)
system.time(mirai_map(1:4, \(x) Sys.sleep(1))[])
#>    user  system elapsed
#>   0.000   0.001   1.003
daemons(0)

Four one-second tasks, one second of wall time. If those were four model fits that each took a minute, you’d go from four minutes down to one – and if you needed more power, switching to Workbench or a Slurm cluster is a one-line change. Visit mirai.r-lib.org for the full documentation.

Acknowledgements

A big thank you to all the folks who helped make this release happen:

@agilly , @aimundo , @barnabasharris , @beevabeeva , @boshek , @eliocamp , @jan-swissre , @jeroenjanssens , @kentqin-cve , @mcol , @michaelmayer2 , @pmac0451 , @r2evans , @shikokuchuo , @t-kalinowski , @VincentGuyader , @wlandau , and @xwanner .

Requires Posit Workbench version 2026.01 or later, which enables launcher authentication using the session cookie. ↩︎

`dplyr::if_else()` and `dplyr::case_when()` are up to 30x faster

Davis Vaughan — Tue, 10 Feb 2026 00:00:00 +0000

In this technical post, we’ll dive into some performance improvements we’ve made to dplyr 1.2.0 to make if_else() and case_when() up to 30x faster and use up to 10x less memory.

If you haven’t seen our previous post about the exciting new features in dplyr 1.2.0, you’ll want to go check that out first!

Here’s a before-and-after benchmark with if_else() :

# Using https://github.com/DavisVaughan/cross
cross::bench_versions(pkgs = c("tidyverse/dplyr@v1.1.4", "tidyverse/dplyr"), {
  library(dplyr)
  set.seed(123)

  condition <- sample(c(TRUE, FALSE, NA), size = 1e7, replace = TRUE)
  x <- sample(10, size = 1e7, replace = TRUE)
  y <- sample(10, size = 1e7, replace = TRUE)
  z <- sample(10, size = 1e7, replace = TRUE)

  bench::mark(if_else = if_else(condition, x, y, missing = z))
})

1
2
3
4
5


#> # A tibble: 2 × 6
#>   pkg                    expression      min   median `itr/sec` mem_alloc
#>                             
#> 1 tidyverse/dplyr@v1.1.4 if_else    248.25ms 249.25ms      4.02   381.6MB
#> 2 tidyverse/dplyr        if_else      7.27ms   7.51ms    132.      38.2MB

And with case_when() :

cross::bench_versions(pkgs = c("tidyverse/dplyr@v1.1.4", "tidyverse/dplyr"), {
  library(dplyr)
  set.seed(123)

  column <- sample(100, size = 1e7, replace = TRUE)

  x_condition <- column < 20
  y_condition <- column < 50
  z_condition <- column < 80

  x <- sample(10, size = 1e7, replace = TRUE)
  y <- sample(10, size = 1e7, replace = TRUE)
  z <- sample(10, size = 1e7, replace = TRUE)

  bench::mark(
    case_when = case_when(
      x_condition ~ x,
      y_condition ~ y,
      z_condition ~ z
    )
  )
})

1
2
3
4
5


#> # A tibble: 2 × 6
#>   pkg                    expression      min   median `itr/sec` mem_alloc
#>                             
#> 1 tidyverse/dplyr@v1.1.4 case_when   228.3ms  231.2ms      4.33   419.9MB
#> 2 tidyverse/dplyr        case_when    15.5ms   15.8ms     62.8     38.3MB

So a 33x speed improvement for if_else() , a 15x speed improvement for case_when() , and a 10x improvement in memory usage for both! In the rest of this post, we’ll explain how we’ve achieved these numbers.

library(dplyr)

Let’s talk memory

We’ll start with case_when() , because if_else() is actually just a small variant of that.

The most important place to start is with the memory usage. Memory usage and raw speed are often related, as allocating memory takes time. Let’s look at the memory usage of case_when() in dplyr 1.1.4:

set.seed(123)

column <- sample(100, size = 1e7, replace = TRUE)

x_condition <- column < 20
y_condition <- column < 50
z_condition <- column < 80

x <- sample(10, size = 1e7, replace = TRUE)
y <- sample(10, size = 1e7, replace = TRUE)
z <- sample(10, size = 1e7, replace = TRUE)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


profmem::profmem(
  threshold = 1000,
  case_when(
    x_condition ~ x,
    y_condition ~ y,
    z_condition ~ z
  )
)
#> Rprofmem memory profiling of:
#> case_when(x_condition ~ x, y_condition ~ y, z_condition ~ z)
#>
#> Memory allocations (>= 1000 bytes):
#>        what     bytes                                           calls
#> 1     alloc  40000048     case_when() -> vec_case_when() -> vec_rep()
#> 2     alloc  40000048                  case_when() -> vec_case_when()
#> 3     alloc  40000056       case_when() -> vec_case_when() -> which()
#> 4     alloc   7600664       case_when() -> vec_case_when() -> which()
#> 5     alloc  40000048                  case_when() -> vec_case_when()
#> 6     alloc  40000056       case_when() -> vec_case_when() -> which()
#> 7     alloc  12003312       case_when() -> vec_case_when() -> which()
#> 8     alloc  40000048                  case_when() -> vec_case_when()
#> 9     alloc  40000056       case_when() -> vec_case_when() -> which()
#> 10    alloc  11996112       case_when() -> vec_case_when() -> which()
#> 11    alloc  40000056       case_when() -> vec_case_when() -> which()
#> 12    alloc   8400112       case_when() -> vec_case_when() -> which()
#> 13    alloc   7600664   case_when() -> vec_case_when() -> vec_slice()
#> 14    alloc  12003312   case_when() -> vec_case_when() -> vec_slice()
#> 15    alloc  11996112   case_when() -> vec_case_when() -> vec_slice()
#> 16    alloc   8400112 case_when() -> vec_case_when() -> vec_recycle()
#> 17    alloc  40000048 case_when() -> vec_case_when() -> list_unchop()
#> total       440000864

That’s a lot of allocations! And it’s pretty hard to understand where they are coming from without a bit more explanation. For that, we’re actually going to “manually” implement an underpowered version of case_when() for this example.

Here’s a diagram of what we need to accomplish:

In bullets:

x_condition selects the blue elements of x
y_condition selects the red elements of y
z_condition selects the green elements of z
A default is built around the unused locations
We combine all of the pieces into out

The trickiest part about case_when() is handling places where x_condition and y_condition overlap. In the image, even though both x and y are selected at location 5, only the value of x is retained since it is hit “first”. This forces us to have to modify y_condition to avoid already “used” locations.

An R implementation that computes these modified locations might look like:

n <- length(x_condition)

unused <- rep(TRUE, times = n) # 1

x_loc <- unused & x_condition # 2
x_loc <- which(x_loc) # 3,4
unused[x_loc] <- FALSE

y_loc <- unused & y_condition # 5
y_loc <- which(y_loc) # 6,7
unused[y_loc] <- FALSE

z_loc <- unused & z_condition # 8
z_loc <- which(z_loc) # 9,10
unused[z_loc] <- FALSE

Anything that is still unused falls through to the default:

default <- NA_integer_
default_loc <- which(unused) # 11,12

With x_loc, y_loc, z_loc, and default_loc in hand, we can build the output from the pieces:

out <- vector("integer", length = n) # 17

out[x_loc] <- x[x_loc] # 13
out[y_loc] <- y[y_loc] # 14
out[z_loc] <- z[z_loc] # 15

out[default_loc] <- rep(default, times = length(default_loc)) # 16

And sure enough, this is identical to case_when() :

identical(
  out,
  case_when(
    x_condition ~ x,
    y_condition ~ y,
    z_condition ~ z
  )
)
#> [1] TRUE

You might be wondering what all of the comments with numbers beside them mean. Those actually map 1:1 with the allocations that case_when() was emitting. In fact, we can now split up those allocations into their respective role:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


#> # Tracking `unused` locations
#> 1     alloc  40000048     case_when() -> vec_case_when() -> vec_rep()

#> # Computing `x_loc`, `y_loc`, and `z_loc`
#> 2     alloc  40000048                  case_when() -> vec_case_when()
#> 3     alloc  40000056       case_when() -> vec_case_when() -> which()
#> 4     alloc   7600664       case_when() -> vec_case_when() -> which()
#> 5     alloc  40000048                  case_when() -> vec_case_when()
#> 6     alloc  40000056       case_when() -> vec_case_when() -> which()
#> 7     alloc  12003312       case_when() -> vec_case_when() -> which()
#> 8     alloc  40000048                  case_when() -> vec_case_when()
#> 9     alloc  40000056       case_when() -> vec_case_when() -> which()
#> 10    alloc  11996112       case_when() -> vec_case_when() -> which()

#> # Computing `default_loc`
#> 11    alloc  40000056       case_when() -> vec_case_when() -> which()
#> 12    alloc   8400112       case_when() -> vec_case_when() -> which()

#> # Slicing `x`, `y`, and `z` to align with `x_loc`, `y_loc`, and `z_loc`
#> 13    alloc   7600664   case_when() -> vec_case_when() -> vec_slice()
#> 14    alloc  12003312   case_when() -> vec_case_when() -> vec_slice()
#> 15    alloc  11996112   case_when() -> vec_case_when() -> vec_slice()

#> # Recycling `default` of `NA` to align with `default_loc`
#> 16    alloc   8400112 case_when() -> vec_case_when() -> vec_recycle()

#> # Final output container, which we assign `x`, `y`, `z`, and `default` into
#> # at locations `x_loc`, `y_loc`, and `z_loc`
#> 17    alloc  40000048 case_when() -> vec_case_when() -> list_unchop()

We sought to remove every one of these allocations except for the last one, which is the final output container that is returned to the user. In other words, we were after this, which is the actual profmem result of this case_when() call in dplyr 1.2.0:

1
2
3
4
5
6
7


#> Rprofmem memory profiling of:
#> case_when(x_condition ~ x, y_condition ~ y, z_condition ~ z)
#>
#> Memory allocations (>= 1000 bytes):
#>        what    bytes                          calls
#> 1     alloc 40000048 case_when() -> vec_case_when()
#> total       40000048

Sliced assignment

To work towards this, let’s focus on what happens to x throughout this process:

We had a hypothesis that we could cut out the intermediate work here. Ideally, we’d take the logical LHS x_condition and the RHS x and map that straight into the output, with no extra allocations:

But this just wasn’t possible with the way that assignment typically works in R!

x <- c("a", "b", "c", "d", "e", "f", "g")
x_condition <- c(FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE)
out <- vector("character", length = length(x_condition))

out[x_condition] <- x
#> Warning in out[x_condition] <- x: number of items to replace is not a multiple of replacement length

Instead, you must pre-slice x to a length that matches the locations that x_condition points to in out, i.e.:

out[x_condition] <- x[x_condition]

Now, in case_when() we don’t actually use [<- for assignment or [ for slicing. Instead, we use tools from vctrs , a low level package for building consistent tidyverse functions. In this case, we’d use vctrs::vec_assign() and vctrs::vec_slice():

out <- vctrs::vec_assign(out, x_condition, vctrs::vec_slice(x, x_condition))

But vec_assign() had the same problem!

To solve this, we’ve added a new boolean argument to vec_assign() called slice_value. You use it like this:

out <- vctrs::vec_assign(out, x_condition, x, slice_value = TRUE)

With slice_value = TRUE, vec_assign() assumes that both out and x are the same length and that x_condition applies to both of these. Internally, rather than materializing x[x_condition], we instead just loop over both out and x at the same time (at C level) and copy over values from x whenever x_condition is TRUE.

This is huge! It means that allocations 13-15 from above related to slicing x, y, and z all disappear.

Logical `i`ndices

You might have noticed that we’ve been using which() quite a bit in the above algorithm. This turns a logical vector of TRUE and FALSE into an integer vector of locations pointing to where the logical vector was TRUE:

x_condition <- c(TRUE, FALSE, TRUE, FALSE, FALSE, TRUE)
x_loc <- which(x_condition)
x_loc
#> [1] 1 3 6

We perform this conversion up front due to how the following works at C level:

out[x_condition] <- x[x_condition]

Both [ and [<- will convert a logical x_condition into the integer x_loc form before proceeding with the assignment, meaning that which() gets called twice if we don’t do it once up front. And vctrs is the same way! Both vec_assign() and vec_slice() here would convert x_condition to an integer vector.

vctrs::vec_assign(out, x_condition, vctrs::vec_slice(x, x_condition))

Now, with the previous optimization we’ve already seen that we can reduce this to:

vctrs::vec_assign(out, x_condition, x, slice_value = TRUE)

But vec_assign() still converts a logical x_condition to integer locations internally before doing the assignment. So now it doesn’t matter whether we do this conversion up front via which() or if we let vec_assign() do it, it still happens once per input. But we’d like to avoid it entirely!

The solution here wasn’t too magical, it just involved a good bit of grunt work. We’ve added a path in vec_assign()’s C code that can handle logical indices like x_condition directly, rather than forcing them to be converted to integer locations first.

But this is a huge win, because it means that allocations 1-10, which were all related to which() , can now be removed. vec_assign() will just handle that optimally for us without any extra allocations.

The nice part about an optimization like this is that any other existing code that is using vec_assign() with a logical index will also benefit from this without having to change a thing!

`default` handling

The remaining allocations are 11-12 and 16, which all have to do with the implied default. Allocations 11-12 were about figuring out where to put default, and allocation 16 was about recycling a typed size 1 default to the right size before assigning it into out.

As it turns out, we don’t need any of this!

In vctrs, when we initialize any output container, we use vec_init():

vctrs::vec_init(integer(), n = 5)
#> [1] NA NA NA NA NA

vctrs::vec_init(tibble(x = integer(), y = character()), n = 5)
#> # A tibble: 5 × 2
#>       x y    
#>    
#> 1    NA NA   
#> 2    NA NA   
#> 3    NA NA   
#> 4    NA NA   
#> 5    NA NA

This already has the implied default assigned to every location. We then overwrite this with x, y, and z at the appropriate locations, but anything left untouched by those is still set to the default, so we’re done!

For cases where the user supplies their own default, things are slightly more complicated. We actually do have to compute a default_loc implied from x_condition, y_condition, and z_condition, but internally we do so using a C vector of bool (even more efficient than R’s logical vector type), so the memory footprint is as small as it can be.

The “first wins” conundrum

One thing we’ve skipped over is the “first wins” behavior of case_when() mentioned earlier. Now that we’ve removed x_loc, y_loc, and z_loc, which is where that was being handled, how do we keep this behavior without slowing things down?

To be explicit, we are talking about this feature of case_when() where only the first hit is kept when you have overlapping logical indices:

x <- c("x1", "x2", "x3")
y <- c("y1", "y2", "y3")
z <- c("z1", "z2", "z3")

x_condition <- c(TRUE, FALSE, TRUE)
y_condition <- c(TRUE, TRUE, FALSE)
z_condition <- c(FALSE, TRUE, TRUE)

case_when(
  x_condition ~ x,
  y_condition ~ y,
  z_condition ~ z
)
#> [1] "x1" "y2" "x3"

A naive approach doesn’t work, as you end up with “last wins” behavior:

out <- vctrs::vec_init(character(), n = 3)

out <- vctrs::vec_assign(out, x_condition, x, slice_value = TRUE)
out <- vctrs::vec_assign(out, y_condition, y, slice_value = TRUE)
out <- vctrs::vec_assign(out, z_condition, z, slice_value = TRUE)

# This is wrong!
out
#> [1] "y1" "z2" "z3"

identical(
  out,
  case_when(
    x_condition ~ x,
    y_condition ~ y,
    z_condition ~ z
  )
)
#> [1] FALSE

Instead, case_when() just iterates in reverse, assigning z, then y, then x:

out <- vctrs::vec_init(character(), n = 3)

out <- vctrs::vec_assign(out, z_condition, z, slice_value = TRUE)
out <- vctrs::vec_assign(out, y_condition, y, slice_value = TRUE)
out <- vctrs::vec_assign(out, x_condition, x, slice_value = TRUE)

identical(
  out,
  case_when(
    x_condition ~ x,
    y_condition ~ y,
    z_condition ~ z
  )
)
#> [1] TRUE

This diagram demonstrates how that works:

Optimizing speed?

Now that we’ve optimized the memory usage of case_when() , you might be wondering if we did anything else to specifically optimize its speed. Not really! We have moved everything from R to C, but focusing our efforts on reducing memory also resulted in some pretty performant code, and there wasn’t much left to optimize after that.

`if_else()`

if_else() can actually be written as a form of case_when() :

if_else(condition, true, false, missing)

case_when(
  condition ~ true,
  !condition ~ false,
  is.na(condition) ~ missing
)

In our actual C implementation of if_else() , for simple types like integer, character, or numeric vectors we have an extremely fast path that’s even more optimized than this, but for anything with a class we pretty much use this exact case_when() approach.

For package developers

If you’re a package developer, you’ll be happy to know that vctrs itself now exposes low dependency versions of if_else() and case_when() , here’s the full family:

vec_if_else()
vec_case_when()
vec_replace_when()
vec_recode_values()
vec_replace_values()

dplyr::if_else() and friends are now just very thin wrappers over these. Feel free to use the vctrs versions in your package if you need the consistency of the tidyverse without the heavy-ish dependency of dplyr.

At the deepest level, `list_combine()`

At the deepest level of all of this is one final new vctrs function, list_combine(). This is a flexible way to combine multiple vectors together at locations specified by indices.

list_combine() powers all of vec_case_when(), vec_replace_when(), vec_recode_values(), vec_replace_values(), vec_if_else(), and even vec_c(), the tidyverse version of c() .

set.seed(123)

column <- sample(100, size = 1e7, replace = TRUE)

x_condition <- column < 20
y_condition <- column < 50
z_condition <- column < 80

x <- sample(10, size = 1e7, replace = TRUE)
y <- sample(10, size = 1e7, replace = TRUE)
z <- sample(10, size = 1e7, replace = TRUE)

out <- vctrs::list_combine(
  x = list(x, y, z),

  # `indices` are allowed to be logical and aren't forced to integer
  indices = list(x_condition, y_condition, z_condition),
  size = length(x_condition),

  # When there are overlaps, take the "first"
  multiple = "first",

  # Same as `slice_value` from `vec_assign()`
  slice_x = TRUE
)

identical(
  out,
  case_when(
    x_condition ~ x,
    y_condition ~ y,
    z_condition ~ z
  )
)
#> [1] TRUE

Blog on Posit Open Source

Introducing Great Docs: Beautiful Documentation for Python Packages

What Is Great Docs?

Why Another Documentation Generator?

Auto-Discovery: Your API, Documented Automatically

Powered by Quarto and quartodoc

Styling and Interactive Features

LLM-Friendly by Default

CLI Documentation

User Guides, Custom Sections, and Blogs

Source Code Links

One-Command Deployment

Quality Tools Built In

Configuration

The Iterative Workflow

What’s Next

Get Started

Chrome Headless Shell in Quarto

Why the switch?

Installing Chrome Headless Shell

Migrating from Chromium

CI and automation

What’s next

RAG with raghilda TRIVIAL

raghilda

How it works

Using with an LLM

Learn more

Structuring Reproducible Research Projects in R: A Workflow with renv, Quarto, and GitHub

Git and GitHub

Data directories

Code organization and outputs

Virtual environments and dependencies (renv)

Quarto: manuscripts and presentations

Why Quarto?

Quarto project structure

Configuration file

Extras: bibliography and citation styles

Bibliography file

Citation style file

Manuscript

Slides

README file

GitHub as infrastructure — not just hosting

Conclusion

April Release Highlights

Key Product Updates

Positron Server for Academic Use via JupyterHub

AI Next Steps in the Native Jupyter Notebook Editor

Agent Skills in Positron Assistant

Positron Assistant Now Supports Microsoft Foundry as a Provider

Telemetry Update: Anonymous Session Identifiers

RStudio Addins Support

R Debugger & Workflow Improvements

Data Explorer: Faster with Multiple DataFrames

Windows ARM Is Generally Available

What’s Coming Next

Inline Outputs for Quarto and R Markdown Files

Packages Pane for Managing Environments

Events and Resources

Explore Positron’s Video Walkthroughs on YouTube

Registration for posit::conf(2026) Is Now Open!

How We Chose a Python Type Checker

Community Affirmations

Positron Server available for academic use via JupyterHub

How it works

Who can use it?

Getting started

What's next: Quarto 2

Why Quarto 2?

What happens to Quarto 1 development?

Current status

Shiny for Python 1.6 brings toolbars and OpenTelemetry

Toolbars

Toolbars in card headers and footers

Toolbars in input labels

Toolbars in text areas

OpenTelemetry

Getting started

OTel is great for GenAI apps

Virtual environments and dependencies (`renv`)

`typst-gather`

`quarto use brand` Command

Nested `case_when()` support

New `separate_trees` argument