How to use Screaming Frog custom extraction to scrape structured data

Key Takeaways

Screaming Frog’s Custom Extraction feature enables precise structured data scraping using XPath, CSSPath, and Regex—ideal for pulling JSON-LD, Microdata, RDFa, and other key on-page elements.
Unlike many crawlers, Screaming Frog can extract both static and JavaScript-rendered content, making it highly effective for modern, dynamic websites.
Structured data extraction supports Schema.org validation, rich snippet optimization, competitive analysis, and large-scale data collection for SEO and content insights.
Custom Extraction allows targeted audits, such as pulling product details, authors, reviews, event info, or metadata for comparison, cleanup, or reporting.
Boost efficiency by testing selectors before full crawls, combining XPath + Regex for JSON-LD, and using Rows to Columns Converter to clean and format output for analysis.
Integrating Custom Extraction into SEO workflows enables continuous monitoring of structured data health, ensuring compliance with Google standards and maximizing rich result potential.

Extracting structured data from websites can be a game changer for SEO audits, content analysis, and data-driven decision making. Screaming Frog’s Custom Extraction feature offers a powerful way to scrape exactly the data you need, whether it’s hidden in HTML, embedded in JavaScript-rendered content, or formatted as JSON-LD, Microdata, or RDFa. This guide breaks down how to harness this tool effectively for structured data scraping.

Understanding Screaming Frog’s Custom Extraction

Screaming Frog is known for its website crawling capabilities, but its Custom Extraction feature elevates its utility by allowing users to pull out specific data points using XPath, CSSPath, and Regex selectors. This means you’re not limited to just grabbing page titles or meta descriptions—you can target any element on a page, including structured data embedded in various formats. The ability to customize your data extraction process can save you countless hours of manual work, especially when dealing with large websites or complex data structures.

What sets this feature apart is its ability to handle both static HTML and JavaScript-rendered content. Many tools struggle with JavaScript-heavy sites, but Screaming Frog’s rendering engine ensures you can extract data even when it’s dynamically loaded. This flexibility is crucial for modern websites that rely heavily on client-side rendering. Moreover, the Custom Extraction feature allows you to create multiple extraction configurations, enabling you to tailor your approach based on the specific needs of different projects or clients.

Why Structured Data Extraction Matters

Structured data, such as JSON-LD, Microdata, and RDFa, provides search engines with explicit clues about the meaning of a page’s content. Extracting and analyzing this data can help you validate your Schema.org implementations, identify opportunities for rich snippets, and ensure compliance with Google’s requirements. By leveraging structured data, you enhance your site's visibility in search results, making it more likely that users will engage with your content. Additionally, structured data can improve click-through rates by making your listings more visually appealing with rich snippets, such as star ratings, product prices, and event dates.

Screaming Frog’s Structured Data Analysis tool complements Custom Extraction by validating structured data formats, but when you want to pull specific fields or perform custom audits, Custom Extraction is your go-to method. This feature not only allows for targeted data retrieval but also supports the continuous monitoring of your structured data implementations. As search engines evolve and update their algorithms, having the ability to quickly adapt and extract relevant data can give you a competitive edge in the ever-changing landscape of SEO.

Setting Up Custom Extraction for Structured Data

Getting started with Custom Extraction involves configuring the selectors that tell Screaming Frog what to scrape. Here’s a step-by-step approach:

1. Choose Your Extraction Method

Screaming Frog supports three main selector types:

XPath: Ideal for navigating XML and HTML trees, perfect for targeting elements based on their hierarchy and attributes.
CSSPath: Useful when you prefer CSS selectors, often simpler for classes and IDs.
Regex: Best for extracting patterns within text, such as phone numbers or email addresses embedded in content.

For structured data embedded as JSON-LD scripts, XPath combined with Regex often works best to isolate and parse the JSON content.

2. Identify Your Target Data

Before configuring extraction, inspect the page source or use browser developer tools to locate the structured data you want. For example, JSON-LD is typically found within <script type="application/ld+json"> tags. Microdata and RDFa are embedded directly in HTML elements with specific attributes.

Once identified, write selectors that precisely target these elements. For JSON-LD, an XPath like //script[@type='application/ld+json'] extracts the entire JSON block, which you can then parse with Regex to isolate specific fields. This meticulous approach is crucial, especially when dealing with large datasets or when the structured data varies significantly between pages. Additionally, understanding the structure of the JSON-LD will allow you to extract nested properties effectively, enhancing the richness of the data you collect.

3. Test and Refine Your Selectors

Use Screaming Frog’s extraction preview to validate your selectors on sample URLs before running a full crawl. This step ensures accuracy and saves time by preventing extraction errors.

Expert SEO consultant Zoren Pamolarcon highlights that leveraging XPath and Regex in Custom Extraction significantly enhances technical SEO audits by enabling highly targeted data collection. This precision is invaluable when dealing with complex or inconsistent markup across pages. Moreover, it’s beneficial to keep an eye on the performance of your selectors over time, as website updates or changes in markup can lead to extraction failures. Regularly revisiting and refining your selectors will ensure that your data remains accurate and relevant, allowing you to adapt to the evolving landscape of web content.

Practical Use Cases for Custom Extraction

Custom Extraction is versatile. Here are some common scenarios where it shines:

Extracting Rich Snippet Data

Rich snippets rely on structured data to display enhanced search results. Using Custom Extraction, you can pull out fields like product prices, reviews, event dates, or recipe details from JSON-LD or Microdata. This helps verify that your markup is complete and consistent across your site.

Auditing Schema.org Implementations

While Screaming Frog’s Structured Data Analysis tool validates markup syntax, Custom Extraction lets you gather raw data for deeper analysis. For example, you might extract all “author” names from article schema to check for missing or incorrect attributions.

Competitive Analysis and Data Aggregation

Beyond SEO, Custom Extraction can scrape structured data from competitor sites or industry aggregators. This can inform pricing strategies, content gaps, or market trends by collecting standardized data points at scale.

Optimizing Your Workflow with Data Cleanup

Raw extracted data often needs reformatting before it’s ready for analysis. Jason Melman, an SEO workflow expert, recommends using Screaming Frog’s Rows to Columns Converter tool to clean up and reorganize Custom Extraction output. This step transforms data from a vertical list into a more usable tabular format, speeding up downstream processing in spreadsheets or databases.

Incorporating this cleanup into your workflow ensures that the data you extract is actionable, not just collected.

Advanced Developments in Data Extraction

While Screaming Frog offers robust extraction capabilities, recent research points to exciting advances in the field. For instance, the SCRIBES framework uses reinforcement learning to generate reusable extraction scripts, improving efficiency for large-scale semi-structured data scraping.

Additionally, the ChatSchema method combines Large Multimodal Models with Optical Character Recognition to extract and structure data from unstructured sources like medical reports. Though these techniques are beyond the scope of typical SEO tasks, they illustrate the direction data extraction technology is heading-toward greater automation and accuracy.

Tips for Maximizing Custom Extraction Success

Start Small: Test your selectors on a handful of URLs before scaling up.
Use Rendering Mode: Enable JavaScript rendering in Screaming Frog to capture dynamic content.
Combine Selectors: Use XPath to isolate elements and Regex to parse text within those elements for granular control.
Validate Regularly: Cross-check extracted data against source pages to catch errors early.
Document Your Selectors: Keep a record of your extraction logic for future audits or updates.

Conclusion

Screaming Frog’s Custom Extraction is a powerful tool for scraping structured data, offering flexibility through XPath, CSSPath, and Regex selectors. It supports modern web technologies by handling JavaScript-rendered content and multiple structured data formats like JSON-LD, Microdata, and RDFa.

By mastering Custom Extraction, SEO professionals and data analysts can gain precise insights into structured data implementations, validate rich snippet readiness, and gather competitive intelligence. Coupled with data cleanup tools and an understanding of advanced extraction methods, it becomes an indispensable part of any technical SEO toolkit.

Written by

Larry Norris

Founder & CEO, RedSEO

Larry built RedSEO after seven years in agency SEO — leading campaigns across industries, earning top-three rankings, and securing AI overviews. He's hands-on with every client strategy and publishes data-driven SEO insights from the field.

7+ years in SEO100+ clients servedHands-on strategist

More about Larry

Utah SEO Services

Custom Web Development

More Services

RedTools Platform

SEO Chrome Extension

Popular Tools

SEO Insights & Analysis

Featured Articles

Industries