ScrapeUp API is designed to simplify web scraping. A few things to consider before we get started:
Send a GET request to http://api.scrapeup.com with two query parameters: api_key (your API key) and url (the URL to scrape). Make sure to URL-encode the target URL to avoid conflicts with query parameter separators.
Use a POST request to send payloads to the target URL. Send a JSON body to http://api.scrapeup.com/ with your parameters.
Use the json field. It accepts either a JSON object or a stringified JSON string.
Use the form field with URL-encoded form data.
To send a content type other than application/json or application/x-www-form-urlencoded, set the Content-Type header and enable keep_headers.
Notes:
method parameter.country_code, render, etc.) work in POST requests as body fields.method parameter accepts: GET, POST, PUT, PATCH, HEAD, DELETE, OPTIONS.ScrapeUp's AI extraction engine lets you describe the data you want in plain English and receive clean, structured JSON back. No CSS selectors, no XPath, no regex — just tell the API what you need.
Use the extract parameter with a JSON object where keys are the field names you want and values describe what to extract. This gives you precise control over the output structure.
Use the extract_prompt parameter with a natural language instruction. The AI will interpret your prompt and return structured data accordingly. This is useful when you want the AI to decide the best structure for the data.
Notes:
extract and extract_prompt are mutually exclusive — use one or the other, not both.render=true for JavaScript-rendered content.include_html=true to also receive the raw HTML alongside extracted data. Choose an extraction model based on your speed, accuracy, and cost requirements using the extract_model parameter. If not specified, defaults to fast.
| Model | Best For | Credit Cost |
|---|---|---|
fast | High-volume extraction where speed matters most. Great for simple, well-structured pages. | +5 credits |
balanced | Good balance of accuracy and cost. Handles moderately complex pages well. | +8 credits |
precision | Higher accuracy for complex or messy pages. Better at following nuanced instructions. | +10 credits |
ultra | Maximum accuracy. Best for complex reasoning, multi-step extraction, and difficult pages. | +15 credits |
Extraction credits are added to the base request cost. For example, a standard scrape (1 credit) with extract_model=fast (+5 credits) costs 6 credits total. A rendered scrape (10 credits) with extract_model=precision (+10 credits) costs 20 credits total.
When extraction succeeds, the response includes:
extracted — the structured data extracted from the pageextraction_model — the model used for extractioncredits_used — the total credits chargedIf extraction fails (e.g., the page has no relevant content), the response includes:
extracted: nullextraction_error — a message explaining what went wronghtml — the raw HTML is returned insteadYou are not charged the extraction surcharge when extraction fails — only the base scrape cost applies.
Use the output parameter to convert scraped HTML into a different format without using AI extraction. This is useful when you want clean content but don't need structured data.
Set output=markdown to convert the page HTML to clean Markdown. Ideal for feeding content into your own processing pipelines or LLMs.
Set output=text to strip all HTML and return only the visible text content.
Markdown and text conversion costs +5 credits on top of the base request cost. Set include_html=true to also receive the original HTML alongside the converted output.
For pages that require JavaScript to render content, set render=true. We use a headless Google Chrome instance with anti-detection measures to fetch the fully rendered page. The browser waits for network activity to settle before returning the HTML.
Each render request is charged at 10x the normal rate (1 render request = 10 API credits).
For pages that load content as you scroll (infinite scroll, lazy-loaded images), combine render=true with lazy_load=true. The browser will scroll to the bottom of the page to trigger all lazy-loaded content before returning the HTML.
Note: Lazy loading increases the request time. The scroll timeout is up to 90 seconds on top of the render timeout.
To capture a screenshot of a rendered page, send a PUT request with render=true. The response will include a base64-encoded screenshot of the page.
For Single Page Applications that use hash-based routing (Angular, React, Vue), use the hash parameter with render=true. After the page loads, the browser will navigate to the specified hash route and wait for the content to render.
The hash value should be URL-encoded (e.g., #/dashboard becomes %23/dashboard).
To pass custom headers (User-Agent, cookies, etc.) to the target site, set keep_headers=true. Your request headers will be forwarded to the target. Only use this for customized results — we handle anti-blocking internally.
Note: Some headers are always overridden for security (Host, Accept-Encoding, Connection).
To reuse the same proxy IP across multiple requests, use the session_number parameter with any string value. All requests with the same session number will route through the same proxy IP. Send a different session number to get a new IP.
Route your requests through a specific country using the country_code parameter with a two-letter country code (e.g., us, uk, de).
Available countries depend on your plan:
When using premium=true, a broader set of residential proxy countries is available.
Our standard proxy pool includes millions of datacenter proxies and handles the vast majority of scraping jobs. For particularly difficult sites, we also maintain a pool of residential IPs. Set premium=true to use this pool.
Pricing:
Premium proxies support geotargeting. Combine with country_code:
For the most difficult-to-scrape websites that block even premium residential proxies, use our Web Unlocker feature. Web Unlocker automatically handles CAPTCHAs (PerimeterX, Cloudflare, AWS WAF), browser fingerprinting, and advanced anti-bot detection. Set unlock=true to use this feature.
Pricing:
Web Unlocker supports geotargeting. Combine with country_code:
Web Unlocker also works with render=true for JavaScript-rendered pages:
To bypass the proxy and make a direct request from our servers, set no_proxy=true. This can be useful for testing or when the target site doesn't require proxy rotation.
After a scrape request completes, you can retrieve the stored results later using the request ID returned with each response.
Retrieve your current account usage and limits by sending a GET request to the /account endpoint with your API key.
Fields:
All parameters for GET requests (query string) and POST/PUT requests (JSON body).
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | string | Yes | - | Your API key for authentication |
url | string | Yes | - | Target URL to scrape. Must be URL-encoded in GET requests. Max 4098 characters. |
method | string | No | GET | HTTP method for the target request. Accepts: GET, POST, PUT, PATCH, HEAD, DELETE, OPTIONS. |
render | boolean | No | false | Render JavaScript using a headless browser. Costs 10x credits. |
lazy_load | boolean | No | false | Scroll page to load lazy content. Requires render=true. |
country_code | string | No | - | Two-letter country code for geotargeting (e.g., us, uk). |
premium | boolean | No | false | Use premium residential proxy pool. Costs 25x credits (40x with render). |
unlock | boolean | No | false | Use Web Unlocker for difficult-to-scrape sites with CAPTCHA/anti-bot protection. Costs 50x credits (75x with render). |
session_number | string | No | - | Session ID to reuse the same proxy IP across requests. Max 45 characters. |
keep_headers | boolean | No | false | Forward your request headers to the target site. |
no_proxy | boolean | No | false | Bypass proxy and make a direct request. |
device | string | No | - | Device type identifier. Max 8 characters. |
hash | string | No | - | URL hash/fragment for SPA navigation. Requires render=true. |
base_url | string | No | - | Base URL for relative links. Use "domain" to auto-detect. Max 128 characters. |
json | object/string | No | - | JSON body for POST requests. Accepts object or stringified JSON. POST/PUT only. |
form | string | No | - | URL-encoded form data for POST requests (e.g., "a=1&b=2"). POST/PUT only. |
extract | object | No | - | JSON object with field names as keys and descriptions as values. Triggers AI extraction. Mutually exclusive with extract_prompt. POST only. |
extract_prompt | string | No | - | Natural language instruction for AI extraction. Mutually exclusive with extract. POST only. |
extract_model | string | No | fast | AI model tier: fast, balanced, precision, or ultra. Only applies when using extraction. |
output | string | No | html | Response format: html, markdown, or text. Markdown/text cost +5 credits. |
include_html | boolean | No | false | Include raw HTML in response alongside extracted data or converted output. |
When a request fails, the API returns one of the following error codes:
| Status | Error | Description |
|---|---|---|
| 400 | INVALID_URL | The provided URL is not valid. |
| 400 | INVALID_BASE_URL | The provided base_url is not valid. |
| 400 | INVALID_COUNTRY_CODE | The country code is not supported for your plan or proxy type. |
| 400 | RENDER_NOT_ALLOWED_IN_THIS_PLAN | Your plan does not support headless browser rendering. |
| 400 | PREMIUM_IPS_NOT_ALLOWED_IN_THIS_PLAN | Your plan does not support premium residential proxies. |
| 400 | LOCATION_NOT_ALLOWED_IN_THIS_PLAN | The requested geolocation is not available on your plan. |
| 400 | RESPONSE_SIZE_EXCEEDED | Response exceeded the maximum size limit. You will not be charged. |
| 401 | INVALID_API_KEY | The API key provided is not valid. |
| 401 | MISSING_API_KEY | No API key was provided in the request. |
| 403 | REQUEST_LIMIT_EXCEEDED | You have exceeded your monthly request limit. |
| 429 | CONCURRENCY_LIMIT_EXCEEDED | Too many concurrent requests. Slow down your request rate. |
| 500 | REQUEST_TIME_OUT | All retry attempts failed within the timeout. You will not be charged. |
| 400 | EXTRACT_AND_PROMPT_MUTUALLY_EXCLUSIVE | Cannot use both extract and extract_prompt in the same request. |
| 400 | INVALID_EXTRACT_SCHEMA | The extract object is invalid. Must be a JSON object with string values. |
| 400 | INVALID_EXTRACT_MODEL | Invalid extract_model. Must be fast, balanced, precision, or ultra. |
| 400 | INVALID_OUTPUT_FORMAT | Invalid output format. Must be html, markdown, or text. |