Scrapy already gives you concurrency controls, retries, and item pipelines. The missing piece for production scraping is usually the proxy layer. If every request leaves through the same IP, the crawl eventually slows down under bans, rate limits, or geo mismatches.
This setup keeps the NinjaProxy endpoint model aligned with the public docs: copy the exact host and port from the portal, keep credentials in environment variables, and only add rotation controls where they belong.
When to use downloader middleware
You can set request.meta["proxy"] one request at a time, but middleware is the cleaner default for most projects.
- One integration point for every spider in the project
- Environment-based credentials instead of hard-coded proxy strings
- Per-request overrides still available when a spider needs a different route
- Easy rotation hooks if you later add sticky sessions or geo targeting
Use middleware when you want proxy behavior applied consistently across the crawl instead of repeating the same setup inside every spider.
Required portal values
Before touching Scrapy, copy three values from NinjaProxy:
- Your portal
username - Your per-user
apiKey - The exact HTTP endpoint from Portal → Proxy IPs or Rotating Gateway IPs
Do not invent hostnames or ports. Public docs use placeholders like because the real endpoint is account-specific.
Middleware implementation
The minimal middleware only has to build one authenticated proxy URL and attach it to the outgoing request.
import os
def require_env(name: str) -> str:
value = os.getenv(name)
if not value:
raise RuntimeError(f"Missing required environment variable: {name}")
return value
class NinjaProxyMiddleware:
def process_request(self, request, spider):
username = require_env("NINJAPROXY_USERNAME")
api_key = require_env("NINJAPROXY_API_KEY")
http_endpoint = require_env("NINJAPROXY_HTTP_ENDPOINT")
request.meta["proxy"] = f"http://{username}:{api_key}@{http_endpoint}"That matches the example file in this repo at ipn-190-ninjasproxy-examples/python/scrapy_middleware.py, so the blog post and the sample stay consistent.
Enable the middleware in settings.py
Scrapy only runs downloader middleware after you register it.
DOWNLOADER_MIDDLEWARES = {
"myproject.middlewares.NinjaProxyMiddleware": 543,
}If your project already uses retry, user-agent, or ban-detection middleware, keep those entries and add NinjaProxy alongside them. The important part is that the dotted import path matches the file where you defined the middleware class.
Add rotating or sticky routes
For assigned/static endpoints, the basic URL is enough. For rotating gateways, keep the same endpoint and append controls to the username only.
def build_rotating_username(base_username: str, session_id: str) -> str:
return (
f"{base_username}"
f"--session-{session_id}"
f"--duration-90"
f"--provider-res"
f"--geo-country-us"
)Then use that routed username when building request.meta["proxy"]. Reuse the same session_id when you want a sticky IP for a short flow. Change or remove it when you want a fresh route.
Per-spider overrides
Middleware does not lock you into one proxy profile forever. A spider can still override the default route for a sensitive request.
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url,
meta={
"proxy": "http://<USERNAME>--session-serp-us-1:<API_KEY>@<ROTATING_HTTP_ENDPOINT>"
},
)That pattern is useful when one spider needs geo-targeting, a sticky checkout session, or a separate route for login pages.
Verify the setup
After wiring the middleware, validate with a target that echoes your IP or headers before starting a full crawl.
- Export
NINJAPROXY_USERNAME,NINJAPROXY_API_KEY, andNINJAPROXY_HTTP_ENDPOINT - Run a small spider against
https://ip.ninjasproxy.com/orhttps://httpbin.org/ip - Confirm requests succeed and the exit IP matches the route you expect
- Only then raise concurrency and crawl depth
This catches bad credentials and malformed endpoints early, before Scrapy fans the mistake across hundreds of requests.
Common failure modes
- 407 Proxy Authentication Required means the username, API key, or username-control syntax is wrong.
- Timeouts on every request usually mean the endpoint was typed manually instead of copied from the portal.
- No rotation happening usually means you reused the same
--session-...token or never switched from a static endpoint to a rotating gateway. - Middleware not firing usually means the
DOWNLOADER_MIDDLEWARESimport path is wrong or the settings module was not loaded for that crawl.
Relevant docs
Ready to implement?
