NAVNavbar
Logo
cURL php NodeJS Python cSharp

OnPage API Raw HTML

‌‌
This endpoint returns the HTML of a page you indicate in the request.

Note: to use this endpoint, make sure the store_raw_html parameter in the Task Post request is set to true

Instead of ‘login’ and ‘password’ use your credentials from https://app.dataforseo.com/api-access

<?php
// You can download this file from here https://cdn.dataforseo.com/v3/examples/php/php_RestClient.zip
require('RestClient.php');
$api_url = 'https://api.dataforseo.com/';
// Instead of 'login' and 'password' use your credentials from https://app.dataforseo.com/api-access
$client = new RestClient($api_url, null, 'login', 'password');

$post_array = array();
// simple way to get a result
$post_array[] = array(
   "id" => "07281559-0695-0216-0000-c269be8b7592",
   "url" => "https://dataforseo.com/apis"
);
try {
   // POST /v3/on_page/raw_html
   // the full list of possible parameters is available in documentation
   $result = $client->post('/v3/on_page/raw_html', $post_array);
   print_r($result);
   // do something with post result
} catch (RestClientException $e) {
   echo "\n";
   print "HTTP code: {$e->getHttpCode()}\n";
   print "Error code: {$e->getCode()}\n";
   print "Message: {$e->getMessage()}\n";
   print  $e->getTraceAsString();
   echo "\n";
}

$client = null;
?>

The above command returns JSON structured like this:

{
  "version": "0.1.20200805",
  "status_code": 20000,
  "status_message": "Ok.",
  "time": "0.0896 sec.",
  "cost": 0,
  "tasks_count": 1,
  "tasks_error": 0,
  "tasks": [
    {
      "id": "07281559-0695-0216-0000-c269be8b7592",
      "status_code": 20000,
      "status_message": "Ok.",
      "time": "0.0214 sec.",
      "cost": 0,
      "result_count": 1,
      "path": [
        "v3",
        "on_page",
        "raw_html"
      ],
      "data": {
        "api": "on_page",
        "function": "raw_html",
        "url": "https://dataforseo.com/apis"
      },
      "result": [
        {
          "crawl_progress": "in_progress",
          "crawl_status": {
            "max_crawl_pages": 10,
            "pages_in_queue": 0,
            "pages_crawled": 10
          },
          "items_count": 1,
          "items": {
            "html": "<!doctype html><html><body><head></head></body></html>"
          }
        }
      ]
    }
  ]
}

All POST data should be sent in the JSON format (UTF-8 encoding). The task setting is done using the POST method. When setting a task, you should send all task parameters in the task array of the generic POST array.

Description of the fields for setting a task:

Field name Type Description
id string ID of the task
required field
you can get this ID in the response of the Task POST endpoint
example:
“07131248-1535-0216-1000-17384017ad04”
url string page url
required field
the absolute URL of a page to request HTML
Note: this field is optional if the task was set using the Instant Pages endpoint

‌‌‌‌‌‌
As a response of the API server, you will receive JSON-encoded data containing a tasks array with the information specific to the set tasks.

Description of the fields in the results array:

Field name Type Description
version string the current version of the API
status_code integer general status code
you can find the full list of the response codes here
Note: we strongly recommend designing a necessary system for handling related exceptional or error conditions
status_message string general informational message
you can find the full list of general informational messages here
time string execution time, seconds
cost float total tasks cost, USD
tasks_count integer the number of tasks in the tasks array
tasks_error integer the number of tasks in the tasks array returned with an error
tasks array array of tasks
        id string task identifier
unique task identifier in our system in the UUID format
        status_code integer status code of the task
generated by DataForSEO; can be within the following range: 10000-60000
you can find the full list of the response codes here
        status_message string informational message of the task
you can find the full list of general informational messages here
        time string execution time, seconds
        cost float cost of the task, USD
        result_count integer number of elements in the result array
        path array URL path
        data object contains the same parameters that you specified in the POST request
        result array array of results
            crawl_progress string status of the crawling session
possible values: in_progress, finished
            crawl_status object details of the crawling session
               max_crawl_pages integer maximum number of pages to crawl
indicates the max_crawl_pages limit you specified when setting a task
               pages_in_queue integer number of pages that are currently in the crawling queue
               pages_crawled integer number of crawled pages
            items_count integer number of items in the results array
            items array items array
                html string HTML page

‌‌