Getting Started

ScrapeUp API is designed to simplify web scraping. A few things to consider before we get started:

  • Each request will be retried until it can be successfully completed (up to 60 seconds). Set your client timeout to at least 70 seconds to allow for retries. If every attempt fails within 60 seconds, we return a 500 error. You will not be charged for unsuccessful requests (you are only charged for 200 and 404 status codes).
  • If you exceed your plan's concurrent connection limit, the API will respond with a 429 status code. Slow down your request rate to resolve this.
  • Each request returns the raw HTML from the target page, along with response headers and cookies.
  • You can scrape images, PDFs, or other files just as you would any other URL. There is a response size limit of approximately 8 MB per request.

Basic Usage

Send a GET request to http://api.scrapeup.com with two query parameters: api_key (your API key) and url (the URL to scrape). Make sure to URL-encode the target URL to avoid conflicts with query parameter separators.

Code:

Result:

POST Requests

Use a POST request to send payloads to the target URL. Send a JSON body to http://api.scrapeup.com/ with your parameters.

Sending JSON data:

Use the json field. It accepts either a JSON object or a stringified JSON string.

Sending form data:

Use the form field with URL-encoded form data.

Custom content types:

To send a content type other than application/json or application/x-www-form-urlencoded, set the Content-Type header and enable keep_headers.

Notes:

  1. By default, POST requests are forwarded as POST to the target. PUT requests are forwarded as PUT. Override this with the method parameter.
  2. All GET parameters (country_code, render, etc.) work in POST requests as body fields.
  3. The method parameter accepts: GET, POST, PUT, PATCH, HEAD, DELETE, OPTIONS.

AI Extraction

ScrapeUp's AI extraction engine lets you describe the data you want in plain English and receive clean, structured JSON back. No CSS selectors, no XPath, no regex — just tell the API what you need.

Schema-Based Extraction

Use the extract parameter with a JSON object where keys are the field names you want and values describe what to extract. This gives you precise control over the output structure.

Result:

Prompt-Based Extraction

Use the extract_prompt parameter with a natural language instruction. The AI will interpret your prompt and return structured data accordingly. This is useful when you want the AI to decide the best structure for the data.

Result:

Notes:

  • extract and extract_prompt are mutually exclusive — use one or the other, not both.
  • Extraction works with any page. Combine with render=true for JavaScript-rendered content.
  • Set include_html=true to also receive the raw HTML alongside extracted data.

Extraction Models

Choose an extraction model based on your speed, accuracy, and cost requirements using the extract_model parameter. If not specified, defaults to fast.

ModelBest ForCredit Cost
fastHigh-volume extraction where speed matters most. Great for simple, well-structured pages.+5 credits
balancedGood balance of accuracy and cost. Handles moderately complex pages well.+8 credits
precisionHigher accuracy for complex or messy pages. Better at following nuanced instructions.+10 credits
ultraMaximum accuracy. Best for complex reasoning, multi-step extraction, and difficult pages.+15 credits

Extraction credits are added to the base request cost. For example, a standard scrape (1 credit) with extract_model=fast (+5 credits) costs 6 credits total. A rendered scrape (10 credits) with extract_model=precision (+10 credits) costs 20 credits total.

Specifying a Model:

Extraction Response

When extraction succeeds, the response includes:

  • extracted — the structured data extracted from the page
  • extraction_model — the model used for extraction
  • credits_used — the total credits charged

If extraction fails (e.g., the page has no relevant content), the response includes:

  • extracted: null
  • extraction_error — a message explaining what went wrong
  • html — the raw HTML is returned instead

You are not charged the extraction surcharge when extraction fails — only the base scrape cost applies.

Output Formats

Use the output parameter to convert scraped HTML into a different format without using AI extraction. This is useful when you want clean content but don't need structured data.

Markdown

Set output=markdown to convert the page HTML to clean Markdown. Ideal for feeding content into your own processing pipelines or LLMs.

Result:

Plain Text

Set output=text to strip all HTML and return only the visible text content.

Markdown and text conversion costs +5 credits on top of the base request cost. Set include_html=true to also receive the original HTML alongside the converted output.

Rendering Javascript

For pages that require JavaScript to render content, set render=true. We use a headless Google Chrome instance with anti-detection measures to fetch the fully rendered page. The browser waits for network activity to settle before returning the HTML.

Code:

Each render request is charged at 10x the normal rate (1 render request = 10 API credits).

Lazy Loading

For pages that load content as you scroll (infinite scroll, lazy-loaded images), combine render=true with lazy_load=true. The browser will scroll to the bottom of the page to trigger all lazy-loaded content before returning the HTML.

Code:

Note: Lazy loading increases the request time. The scroll timeout is up to 90 seconds on top of the render timeout.

Screenshots

To capture a screenshot of a rendered page, send a PUT request with render=true. The response will include a base64-encoded screenshot of the page.

Code:

SPA / Hash Routing

For Single Page Applications that use hash-based routing (Angular, React, Vue), use the hash parameter with render=true. After the page loads, the browser will navigate to the specified hash route and wait for the content to render.

Code:

The hash value should be URL-encoded (e.g., #/dashboard becomes %23/dashboard).

Custom Headers

To pass custom headers (User-Agent, cookies, etc.) to the target site, set keep_headers=true. Your request headers will be forwarded to the target. Only use this for customized results — we handle anti-blocking internally.

Code:

Result:

Note: Some headers are always overridden for security (Host, Accept-Encoding, Connection).

Sessions

To reuse the same proxy IP across multiple requests, use the session_number parameter with any string value. All requests with the same session number will route through the same proxy IP. Send a different session number to get a new IP.

Code:

Result:

Geographic Location

Route your requests through a specific country using the country_code parameter with a two-letter country code (e.g., us, uk, de).

Code:

Available countries depend on your plan:

  • Startup: United States (us)
  • Business: US, Canada (ca), United Kingdom (uk), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Japan (jp), China (cn), Australia (au)
  • Enterprise: All countries available upon request

When using premium=true, a broader set of residential proxy countries is available.

Premium Residential Proxies

Our standard proxy pool includes millions of datacenter proxies and handles the vast majority of scraping jobs. For particularly difficult sites, we also maintain a pool of residential IPs. Set premium=true to use this pool.

Code:

Pricing:

  • Premium requests: 25x normal rate (25 API credits per request)
  • Premium + render: 40x normal rate (40 API credits per request)

Premium proxies support geotargeting. Combine with country_code:

Web Unlocker

For the most difficult-to-scrape websites that block even premium residential proxies, use our Web Unlocker feature. Web Unlocker automatically handles CAPTCHAs (PerimeterX, Cloudflare, AWS WAF), browser fingerprinting, and advanced anti-bot detection. Set unlock=true to use this feature.

Code:

Pricing:

  • Unlock requests: 50x normal rate (50 API credits per request)
  • Unlock + render: 75x normal rate (75 API credits per request)

Web Unlocker supports geotargeting. Combine with country_code:

Web Unlocker also works with render=true for JavaScript-rendered pages:

No Proxy (Direct Request)

To bypass the proxy and make a direct request from our servers, set no_proxy=true. This can be useful for testing or when the target site doesn't require proxy rotation.

Code:

Query Stored Results

After a scrape request completes, you can retrieve the stored results later using the request ID returned with each response.

Code:

Account Information

Retrieve your current account usage and limits by sending a GET request to the /account endpoint with your API key.

Code:

Result:

Fields:

  • concurrentRequests: Currently active concurrent requests
  • requestCount: Total successful requests this billing cycle
  • failedRequestCount: Total failed requests this billing cycle
  • requestLimit: Maximum requests allowed per billing cycle
  • concurrencyLimit: Maximum concurrent requests allowed

Parameter Reference

All parameters for GET requests (query string) and POST/PUT requests (JSON body).

ParameterTypeRequiredDefaultDescription
api_keystringYes-Your API key for authentication
urlstringYes-Target URL to scrape. Must be URL-encoded in GET requests. Max 4098 characters.
methodstringNoGETHTTP method for the target request. Accepts: GET, POST, PUT, PATCH, HEAD, DELETE, OPTIONS.
renderbooleanNofalseRender JavaScript using a headless browser. Costs 10x credits.
lazy_loadbooleanNofalseScroll page to load lazy content. Requires render=true.
country_codestringNo-Two-letter country code for geotargeting (e.g., us, uk).
premiumbooleanNofalseUse premium residential proxy pool. Costs 25x credits (40x with render).
unlockbooleanNofalseUse Web Unlocker for difficult-to-scrape sites with CAPTCHA/anti-bot protection. Costs 50x credits (75x with render).
session_numberstringNo-Session ID to reuse the same proxy IP across requests. Max 45 characters.
keep_headersbooleanNofalseForward your request headers to the target site.
no_proxybooleanNofalseBypass proxy and make a direct request.
devicestringNo-Device type identifier. Max 8 characters.
hashstringNo-URL hash/fragment for SPA navigation. Requires render=true.
base_urlstringNo-Base URL for relative links. Use "domain" to auto-detect. Max 128 characters.
jsonobject/stringNo-JSON body for POST requests. Accepts object or stringified JSON. POST/PUT only.
formstringNo-URL-encoded form data for POST requests (e.g., "a=1&b=2"). POST/PUT only.
extractobjectNo-JSON object with field names as keys and descriptions as values. Triggers AI extraction. Mutually exclusive with extract_prompt. POST only.
extract_promptstringNo-Natural language instruction for AI extraction. Mutually exclusive with extract. POST only.
extract_modelstringNofastAI model tier: fast, balanced, precision, or ultra. Only applies when using extraction.
outputstringNohtmlResponse format: html, markdown, or text. Markdown/text cost +5 credits.
include_htmlbooleanNofalseInclude raw HTML in response alongside extracted data or converted output.

Error Codes

When a request fails, the API returns one of the following error codes:

StatusErrorDescription
400INVALID_URLThe provided URL is not valid.
400INVALID_BASE_URLThe provided base_url is not valid.
400INVALID_COUNTRY_CODEThe country code is not supported for your plan or proxy type.
400RENDER_NOT_ALLOWED_IN_THIS_PLANYour plan does not support headless browser rendering.
400PREMIUM_IPS_NOT_ALLOWED_IN_THIS_PLANYour plan does not support premium residential proxies.
400LOCATION_NOT_ALLOWED_IN_THIS_PLANThe requested geolocation is not available on your plan.
400RESPONSE_SIZE_EXCEEDEDResponse exceeded the maximum size limit. You will not be charged.
401INVALID_API_KEYThe API key provided is not valid.
401MISSING_API_KEYNo API key was provided in the request.
403REQUEST_LIMIT_EXCEEDEDYou have exceeded your monthly request limit.
429CONCURRENCY_LIMIT_EXCEEDEDToo many concurrent requests. Slow down your request rate.
500REQUEST_TIME_OUTAll retry attempts failed within the timeout. You will not be charged.
400EXTRACT_AND_PROMPT_MUTUALLY_EXCLUSIVECannot use both extract and extract_prompt in the same request.
400INVALID_EXTRACT_SCHEMAThe extract object is invalid. Must be a JSON object with string values.
400INVALID_EXTRACT_MODELInvalid extract_model. Must be fast, balanced, precision, or ultra.
400INVALID_OUTPUT_FORMATInvalid output format. Must be html, markdown, or text.

Code Examples

Python

Node.js (GET)

Node.js (POST with JSON body)

Python (AI Extraction)

Node.js (AI Extraction)