API Documentation

Getting Started

ScrapeUp API is designed to simplify web scraping. A few things to consider before we get started:

Each request will be retried until it can be successfully completed (up to 60 seconds). Set your client timeout to at least 70 seconds to allow for retries. If every attempt fails within 60 seconds, we return a 500 error. You will not be charged for unsuccessful requests (you are only charged for 200 and 404 status codes).
If you exceed your plan's concurrent connection limit, the API will respond with a 429 status code. Slow down your request rate to resolve this.
Each request returns the raw HTML from the target page, along with response headers and cookies.
You can scrape images, PDFs, or other files just as you would any other URL. There is a response size limit of approximately 8 MB per request.

Basic Usage

Send a GET request to http://api.scrapeup.com with two query parameters: api_key (your API key) and url (the URL to scrape). Make sure to URL-encode the target URL to avoid conflicts with query parameter separators.

Code:

Result:

POST Requests

Use a POST request to send payloads to the target URL. Send a JSON body to http://api.scrapeup.com/ with your parameters.

Sending JSON data:

Use the json field. It accepts either a JSON object or a stringified JSON string.

Sending form data:

Use the form field with URL-encoded form data.

Custom content types:

To send a content type other than application/json or application/x-www-form-urlencoded, set the Content-Type header and enable keep_headers.

Notes:

By default, POST requests are forwarded as POST to the target. PUT requests are forwarded as PUT. Override this with the method parameter.
All GET parameters (country_code, render, etc.) work in POST requests as body fields.
The method parameter accepts: GET, POST, PUT, PATCH, HEAD, DELETE, OPTIONS.

AI Extraction

ScrapeUp's AI extraction engine lets you describe the data you want in plain English and receive clean, structured JSON back. No CSS selectors, no XPath, no regex — just tell the API what you need.

Schema-Based Extraction

Use the extract parameter with a JSON object where keys are the field names you want and values describe what to extract. This gives you precise control over the output structure.

Result:

Prompt-Based Extraction

Use the extract_prompt parameter with a natural language instruction. The AI will interpret your prompt and return structured data accordingly. This is useful when you want the AI to decide the best structure for the data.

Result:

Notes:

extract and extract_prompt are mutually exclusive — use one or the other, not both.
Extraction works with any page. Combine with render=true for JavaScript-rendered content.
Set include_html=true to also receive the raw HTML alongside extracted data.

Extraction Models

Choose an extraction model based on your speed, accuracy, and cost requirements using the extract_model parameter. If not specified, defaults to fast.

Model	Best For	Credit Cost
`fast`	High-volume extraction where speed matters most. Great for simple, well-structured pages.	+5 credits
`balanced`	Good balance of accuracy and cost. Handles moderately complex pages well.	+8 credits
`precision`	Higher accuracy for complex or messy pages. Better at following nuanced instructions.	+10 credits
`ultra`	Maximum accuracy. Best for complex reasoning, multi-step extraction, and difficult pages.	+15 credits

Extraction credits are added to the base request cost. For example, a standard scrape (1 credit) with extract_model=fast (+5 credits) costs 6 credits total. A rendered scrape (10 credits) with extract_model=precision (+10 credits) costs 20 credits total.

Specifying a Model:

Extraction Response

When extraction succeeds, the response includes:

extracted — the structured data extracted from the page
extraction_model — the model used for extraction
credits_used — the total credits charged

If extraction fails (e.g., the page has no relevant content), the response includes:

extracted: null
extraction_error — a message explaining what went wrong
html — the raw HTML is returned instead

You are not charged the extraction surcharge when extraction fails — only the base scrape cost applies.

Output Formats

Use the output parameter to convert scraped HTML into a different format without using AI extraction. This is useful when you want clean content but don't need structured data.

Markdown

Set output=markdown to convert the page HTML to clean Markdown. Ideal for feeding content into your own processing pipelines or LLMs.

Result:

Plain Text

Set output=text to strip all HTML and return only the visible text content.

Markdown and text conversion costs +5 credits on top of the base request cost. Set include_html=true to also receive the original HTML alongside the converted output.

Rendering Javascript

For pages that require JavaScript to render content, set render=true. We use a headless Google Chrome instance with anti-detection measures to fetch the fully rendered page. The browser waits for network activity to settle before returning the HTML.

Code:

Each render request is charged at 10x the normal rate (1 render request = 10 API credits).

Lazy Loading

For pages that load content as you scroll (infinite scroll, lazy-loaded images), combine render=true with lazy_load=true. The browser will scroll to the bottom of the page to trigger all lazy-loaded content before returning the HTML.

Code:

Note: Lazy loading increases the request time. The scroll timeout is up to 90 seconds on top of the render timeout.

Screenshots

To capture a screenshot of a rendered page, send a PUT request with render=true. The response will include a base64-encoded screenshot of the page.

Code:

SPA / Hash Routing

For Single Page Applications that use hash-based routing (Angular, React, Vue), use the hash parameter with render=true. After the page loads, the browser will navigate to the specified hash route and wait for the content to render.

Code:

The hash value should be URL-encoded (e.g., #/dashboard becomes %23/dashboard).

Custom Headers

To pass custom headers (User-Agent, cookies, etc.) to the target site, set keep_headers=true. Your request headers will be forwarded to the target. Only use this for customized results — we handle anti-blocking internally.

Code:

Result:

Note: Some headers are always overridden for security (Host, Accept-Encoding, Connection).

Sessions

To reuse the same proxy IP across multiple requests, use the session_number parameter with any string value. All requests with the same session number will route through the same proxy IP. Send a different session number to get a new IP.

Code:

Result:

Geographic Location

Route your requests through a specific country using the country_code parameter with a two-letter country code (e.g., us, uk, de).

Code:

Available countries depend on your plan:

Startup: United States (us)
Business: US, Canada (ca), United Kingdom (uk), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Japan (jp), China (cn), Australia (au)
Enterprise: All countries available upon request

When using premium=true, a broader set of residential proxy countries is available.

Premium Residential Proxies

Our standard proxy pool includes millions of datacenter proxies and handles the vast majority of scraping jobs. For particularly difficult sites, we also maintain a pool of residential IPs. Set premium=true to use this pool.

Code:

Pricing:

Premium requests: 25x normal rate (25 API credits per request)
Premium + render: 40x normal rate (40 API credits per request)

Premium proxies support geotargeting. Combine with country_code:

Web Unlocker

For the most difficult-to-scrape websites that block even premium residential proxies, use our Web Unlocker feature. Web Unlocker automatically handles CAPTCHAs (PerimeterX, Cloudflare, AWS WAF), browser fingerprinting, and advanced anti-bot detection. Set unlock=true to use this feature.

Code:

Pricing:

Unlock requests: 50x normal rate (50 API credits per request)
Unlock + render: 75x normal rate (75 API credits per request)

Web Unlocker supports geotargeting. Combine with country_code:

Web Unlocker also works with render=true for JavaScript-rendered pages:

No Proxy (Direct Request)

To bypass the proxy and make a direct request from our servers, set no_proxy=true. This can be useful for testing or when the target site doesn't require proxy rotation.

Code:

Query Stored Results

After a scrape request completes, you can retrieve the stored results later using the request ID returned with each response.

Code:

Account Information

Retrieve your current account usage and limits by sending a GET request to the /account endpoint with your API key.

Code:

Result:

Fields:

concurrentRequests: Currently active concurrent requests
requestCount: Total successful requests this billing cycle
failedRequestCount: Total failed requests this billing cycle
requestLimit: Maximum requests allowed per billing cycle
concurrencyLimit: Maximum concurrent requests allowed

Parameter Reference

All parameters for GET requests (query string) and POST/PUT requests (JSON body).

Parameter	Type	Required	Default	Description
`api_key`	string	Yes	-	Your API key for authentication
`url`	string	Yes	-	Target URL to scrape. Must be URL-encoded in GET requests. Max 4098 characters.
`method`	string	No	GET	HTTP method for the target request. Accepts: GET, POST, PUT, PATCH, HEAD, DELETE, OPTIONS.
`render`	boolean	No	false	Render JavaScript using a headless browser. Costs 10x credits.
`lazy_load`	boolean	No	false	Scroll page to load lazy content. Requires `render=true`.
`country_code`	string	No	-	Two-letter country code for geotargeting (e.g., `us`, `uk`).
`premium`	boolean	No	false	Use premium residential proxy pool. Costs 25x credits (40x with render).
`unlock`	boolean	No	false	Use Web Unlocker for difficult-to-scrape sites with CAPTCHA/anti-bot protection. Costs 50x credits (75x with render).
`session_number`	string	No	-	Session ID to reuse the same proxy IP across requests. Max 45 characters.
`keep_headers`	boolean	No	false	Forward your request headers to the target site.
`no_proxy`	boolean	No	false	Bypass proxy and make a direct request.
`device`	string	No	-	Device type identifier. Max 8 characters.
`hash`	string	No	-	URL hash/fragment for SPA navigation. Requires `render=true`.
`base_url`	string	No	-	Base URL for relative links. Use `"domain"` to auto-detect. Max 128 characters.
`json`	object/string	No	-	JSON body for POST requests. Accepts object or stringified JSON. POST/PUT only.
`form`	string	No	-	URL-encoded form data for POST requests (e.g., `"a=1&b=2"`). POST/PUT only.
`extract`	object	No	-	JSON object with field names as keys and descriptions as values. Triggers AI extraction. Mutually exclusive with `extract_prompt`. POST only.
`extract_prompt`	string	No	-	Natural language instruction for AI extraction. Mutually exclusive with `extract`. POST only.
`extract_model`	string	No	fast	AI model tier: `fast`, `balanced`, `precision`, or `ultra`. Only applies when using extraction.
`output`	string	No	html	Response format: `html`, `markdown`, or `text`. Markdown/text cost +5 credits.
`include_html`	boolean	No	false	Include raw HTML in response alongside extracted data or converted output.

Error Codes

When a request fails, the API returns one of the following error codes:

Status	Error	Description
400	INVALID_URL	The provided URL is not valid.
400	INVALID_BASE_URL	The provided base_url is not valid.
400	INVALID_COUNTRY_CODE	The country code is not supported for your plan or proxy type.
400	RENDER_NOT_ALLOWED_IN_THIS_PLAN	Your plan does not support headless browser rendering.
400	PREMIUM_IPS_NOT_ALLOWED_IN_THIS_PLAN	Your plan does not support premium residential proxies.
400	LOCATION_NOT_ALLOWED_IN_THIS_PLAN	The requested geolocation is not available on your plan.
400	RESPONSE_SIZE_EXCEEDED	Response exceeded the maximum size limit. You will not be charged.
401	INVALID_API_KEY	The API key provided is not valid.
401	MISSING_API_KEY	No API key was provided in the request.
403	REQUEST_LIMIT_EXCEEDED	You have exceeded your monthly request limit.
429	CONCURRENCY_LIMIT_EXCEEDED	Too many concurrent requests. Slow down your request rate.
500	REQUEST_TIME_OUT	All retry attempts failed within the timeout. You will not be charged.
400	EXTRACT_AND_PROMPT_MUTUALLY_EXCLUSIVE	Cannot use both `extract` and `extract_prompt` in the same request.
400	INVALID_EXTRACT_SCHEMA	The `extract` object is invalid. Must be a JSON object with string values.
400	INVALID_EXTRACT_MODEL	Invalid `extract_model`. Must be `fast`, `balanced`, `precision`, or `ultra`.
400	INVALID_OUTPUT_FORMAT	Invalid `output` format. Must be `html`, `markdown`, or `text`.

Getting Started

Basic Usage

Code:

Result:

POST Requests

Sending JSON data:

Sending form data:

Custom content types:

AI Extraction

Schema-Based Extraction

Result:

Prompt-Based Extraction

Result:

Extraction Models

Specifying a Model:

Extraction Response

Output Formats

Markdown

Result:

Plain Text

Rendering Javascript

Code:

Lazy Loading

Code:

Screenshots

Code:

SPA / Hash Routing

Code:

Custom Headers

Code:

Result:

Sessions

Code:

Result:

Geographic Location

Code:

Premium Residential Proxies

Code:

Web Unlocker

Code:

No Proxy (Direct Request)

Code:

Query Stored Results

Code:

Account Information

Code:

Result:

Parameter Reference

Error Codes

Code Examples

Python

Node.js (GET)

Node.js (POST with JSON body)

Python (AI Extraction)

Node.js (AI Extraction)