← Blog
EngineeringMay 5, 20268 min read

Scrapy Proxy Middleware Setup Guide

Set up NinjaProxy in Scrapy with downloader middleware, environment-based credentials, rotating gateway username controls, and per-spider proxy overrides.

Scrapy already gives you concurrency controls, retries, and item pipelines. The missing piece for production scraping is usually the proxy layer. If every request leaves through the same IP, the crawl eventually slows down under bans, rate limits, or geo mismatches.

This setup keeps the NinjaProxy endpoint model aligned with the public docs: copy the exact host and port from the portal, keep credentials in environment variables, and only add rotation controls where they belong.

When to use downloader middleware

You can set request.meta["proxy"] one request at a time, but middleware is the cleaner default for most projects.

  • One integration point for every spider in the project
  • Environment-based credentials instead of hard-coded proxy strings
  • Per-request overrides still available when a spider needs a different route
  • Easy rotation hooks if you later add sticky sessions or geo targeting

Use middleware when you want proxy behavior applied consistently across the crawl instead of repeating the same setup inside every spider.

Required portal values

Before touching Scrapy, copy three values from NinjaProxy:

  1. Your portal username
  2. Your per-user apiKey
  3. The exact HTTP endpoint from Portal → Proxy IPs or Rotating Gateway IPs

Do not invent hostnames or ports. Public docs use placeholders like because the real endpoint is account-specific.

Middleware implementation

The minimal middleware only has to build one authenticated proxy URL and attach it to the outgoing request.

import os


def require_env(name: str) -> str:
    value = os.getenv(name)
    if not value:
        raise RuntimeError(f"Missing required environment variable: {name}")
    return value


class NinjaProxyMiddleware:
    def process_request(self, request, spider):
        username = require_env("NINJAPROXY_USERNAME")
        api_key = require_env("NINJAPROXY_API_KEY")
        http_endpoint = require_env("NINJAPROXY_HTTP_ENDPOINT")
        request.meta["proxy"] = f"http://{username}:{api_key}@{http_endpoint}"

That matches the example file in this repo at ipn-190-ninjasproxy-examples/python/scrapy_middleware.py, so the blog post and the sample stay consistent.

Enable the middleware in settings.py

Scrapy only runs downloader middleware after you register it.

DOWNLOADER_MIDDLEWARES = {
    "myproject.middlewares.NinjaProxyMiddleware": 543,
}

If your project already uses retry, user-agent, or ban-detection middleware, keep those entries and add NinjaProxy alongside them. The important part is that the dotted import path matches the file where you defined the middleware class.

Add rotating or sticky routes

For assigned/static endpoints, the basic URL is enough. For rotating gateways, keep the same endpoint and append controls to the username only.

def build_rotating_username(base_username: str, session_id: str) -> str:
    return (
        f"{base_username}"
        f"--session-{session_id}"
        f"--duration-90"
        f"--provider-res"
        f"--geo-country-us"
    )

Then use that routed username when building request.meta["proxy"]. Reuse the same session_id when you want a sticky IP for a short flow. Change or remove it when you want a fresh route.

Per-spider overrides

Middleware does not lock you into one proxy profile forever. A spider can still override the default route for a sensitive request.

def start_requests(self):
    for url in self.start_urls:
        yield scrapy.Request(
            url,
            meta={
                "proxy": "http://<USERNAME>--session-serp-us-1:<API_KEY>@<ROTATING_HTTP_ENDPOINT>"
            },
        )

That pattern is useful when one spider needs geo-targeting, a sticky checkout session, or a separate route for login pages.

Verify the setup

After wiring the middleware, validate with a target that echoes your IP or headers before starting a full crawl.

  1. Export NINJAPROXY_USERNAME, NINJAPROXY_API_KEY, and NINJAPROXY_HTTP_ENDPOINT
  2. Run a small spider against https://ip.ninjasproxy.com/ or https://httpbin.org/ip
  3. Confirm requests succeed and the exit IP matches the route you expect
  4. Only then raise concurrency and crawl depth

This catches bad credentials and malformed endpoints early, before Scrapy fans the mistake across hundreds of requests.

Common failure modes

  • 407 Proxy Authentication Required means the username, API key, or username-control syntax is wrong.
  • Timeouts on every request usually mean the endpoint was typed manually instead of copied from the portal.
  • No rotation happening usually means you reused the same --session-... token or never switched from a static endpoint to a rotating gateway.
  • Middleware not firing usually means the DOWNLOADER_MIDDLEWARES import path is wrong or the settings module was not loaded for that crawl.

Relevant docs

Ready to implement?

Read Python Integration Docs →