OnPage API Raw HTML
This endpoint returns the HTML of a page you indicate in the request.
Note: to use this endpoint, make sure the store_raw_html
parameter in the Task Post request is set to true
This endpoint returns the HTML of a page you indicate in the request.
Note: to use this endpoint, make sure the store_raw_html
parameter in the Task Post request is set to true
Instead of ‘login’ and ‘password’ use your credentials from https://app.dataforseo.com/api-access
<?php // You can download this file from here https://cdn.dataforseo.com/v3/examples/php/php_RestClient.zip require('RestClient.php'); $api_url = 'https://api.dataforseo.com/'; // Instead of 'login' and 'password' use your credentials from https://app.dataforseo.com/api-access $client = new RestClient($api_url, null, 'login', 'password'); $post_array = array(); // simple way to get a result $post_array[] = array( "id" => "07281559-0695-0216-0000-c269be8b7592", "url" => "https://dataforseo.com/apis" ); try { // POST /v3/on_page/raw_html // the full list of possible parameters is available in documentation $result = $client->post('/v3/on_page/raw_html', $post_array); print_r($result); // do something with post result } catch (RestClientException $e) { echo "\n"; print "HTTP code: {$e->getHttpCode()}\n"; print "Error code: {$e->getCode()}\n"; print "Message: {$e->getMessage()}\n"; print $e->getTraceAsString(); echo "\n"; } $client = null; ?>
The above command returns JSON structured like this:
{ "version": "0.1.20200805", "status_code": 20000, "status_message": "Ok.", "time": "0.0896 sec.", "cost": 0, "tasks_count": 1, "tasks_error": 0, "tasks": [ { "id": "07281559-0695-0216-0000-c269be8b7592", "status_code": 20000, "status_message": "Ok.", "time": "0.0214 sec.", "cost": 0, "result_count": 1, "path": [ "v3", "on_page", "raw_html" ], "data": { "api": "on_page", "function": "raw_html", "url": "https://dataforseo.com/apis" }, "result": [ { "crawl_progress": "in_progress", "crawl_status": { "max_crawl_pages": 10, "pages_in_queue": 0, "pages_crawled": 10 }, "items_count": 1, "items": { "html": "<!doctype html><html><body><head></head></body></html>" } } ] } ] }
All POST data should be sent in the JSON format (UTF-8 encoding). The task setting is done using the POST method. When setting a task, you should send all task parameters in the task array of the generic POST array.
Description of the fields for setting a task:
Field name | Type | Description |
---|---|---|
id |
string | ID of the task required field you can get this ID in the response of the Task POST endpoint example: “07131248-1535-0216-1000-17384017ad04” |
url |
string | page url required field the absolute URL of a page to request HTML Note: this field is optional if the task was set using the Instant Pages endpoint |
As a response of the API server, you will receive JSON-encoded data containing a tasks
array with the information specific to the set tasks.
Description of the fields in the results array:
Field name | Type | Description |
---|---|---|
version |
string | the current version of the API |
status_code |
integer | general status code you can find the full list of the response codes here Note: we strongly recommend designing a necessary system for handling related exceptional or error conditions |
status_message |
string | general informational message you can find the full list of general informational messages here |
time |
string | execution time, seconds |
cost |
float | total tasks cost, USD |
tasks_count |
integer | the number of tasks in the tasks array |
tasks_error |
integer | the number of tasks in the tasks array returned with an error |
tasks |
array | array of tasks |
id |
string | task identifier unique task identifier in our system in the UUID format |
status_code |
integer | status code of the task generated by DataForSEO; can be within the following range: 10000-60000 you can find the full list of the response codes here |
status_message |
string | informational message of the task you can find the full list of general informational messages here |
time |
string | execution time, seconds |
cost |
float | cost of the task, USD |
result_count |
integer | number of elements in the result array |
path |
array | URL path |
data |
object | contains the same parameters that you specified in the POST request |
result |
array | array of results |
crawl_progress |
string | status of the crawling session possible values: in_progress , finished |
crawl_status |
object | details of the crawling session |
max_crawl_pages |
integer | maximum number of pages to crawl indicates the max_crawl_pages limit you specified when setting a task |
pages_in_queue |
integer | number of pages that are currently in the crawling queue |
pages_crawled |
integer | number of crawled pages |
items_count |
integer | number of items in the results array |
items |
array | items array |
html |
string | HTML page |