NAVNavbar
Logo
cURL php NodeJS Python cSharp

OnPage API Non-indexable Pages

‌‌
This endpoint returns a list of pages that are blocked from being indexed by Google and other search engines by robots.txt, HTTP headers, or meta tags settings.

Instead of ‘login’ and ‘password’ use your credentials from https://app.dataforseo.com/api-dashboard

<?php
// You can download this file from here https://cdn.dataforseo.com/v3/examples/php/php_RestClient.zip
require('RestClient.php');
$api_url = 'https://api.dataforseo.com/';
// Instead of 'login' and 'password' use your credentials from https://app.dataforseo.com/api-dashboard
$client = new RestClient($api_url, null, 'login', 'password');

$post_array = array();
// simple way to get a result
$post_array[] = array(
   "id" => "07281559-0695-0216-0000-c269be8b7592",
   "filters" => [
      ["reason", "=", "robots_txt"],
      "and",
      ["url", "like", "%go%"]
   ],
   "limit" => 10
);
try {
   // POST /v3/on_page/non_indexable
   // the full list of possible parameters is available in documentation
   $result = $client->post('/v3/on_page/non_indexable', $post_array);
   print_r($result);
   // do something with post result
} catch (RestClientException $e) {
   echo "\n";
   print "HTTP code: {$e->getHttpCode()}\n";
   print "Error code: {$e->getCode()}\n";
   print "Message: {$e->getMessage()}\n";
   print  $e->getTraceAsString();
   echo "\n";
}
$client = null;
?>

The above command returns JSON structured like this:

{
  "version": "0.1.20200805",
  "status_code": 20000,
  "status_message": "Ok.",
  "time": "0.1075 sec.",
  "cost": 0,
  "tasks_count": 1,
  "tasks_error": 0,
  "tasks": [
    {
      "id": "07281559-0695-0216-0000-c269be8b7592",
      "status_code": 20000,
      "status_message": "Ok.",
      "time": "0.0236 sec.",
      "cost": 0,
      "result_count": 1,
      "path": [
        "v3",
        "on_page",
        "non_indexable"
      ],
      "data": {
        "api": "on_page",
        "function": "non_indexable"
      },
      "result": [
        {
          "crawl_progress": "finished",
          "crawl_status": {
            "max_crawl_pages": 10,
            "pages_in_queue": 0,
            "pages_crawled": 10
          },
          "total_items_count": 3,
          "items_count": 2,
          "items": [
            {
              "reason": "robots_txt",
              "url": "https://dataforseo.com/go/"
            },
            {
              "reason": "robots_txt",
              "url": "https://dataforseo.com/wp-admin/"
            }
          ]
        }
      ]
    }
  ]
}

All POST data should be sent in the JSON format (UTF-8 encoding). The task setting is done using the POST method. When setting a task, you should send all task parameters in the task array of the generic POST array.

Description of the fields for setting a task:

Field name Type Description
id string ID of the task
required field
you can get this ID in the response of the Task POST endpoint
example:
“07131248-1535-0216-1000-17384017ad04”
limit integer the maximum number of returned pages
optional field
default value: 100
maximum value: 1000
offset integer offset in the results array of returned pages
optional field
default value: 0
if you specify the 10 value, the first ten pages in the results array will be omitted and the data will be provided for the successive pages
filters array array of results filtering parameters
optional field
you can add several filters at once (8 filters maximum)
you should set a logical operator and, or between the conditions
the following operators are supported:
regex, <, <=, >, >=, =, <>, in, not_in, like, not_like
you can use the % operator with like and not_like to match any string of zero or more characters
example:
["reason","=","robots_txt"][["reason","<>","robots_txt"],
"and",
["url","not_like","%/wp-admin/%"]]

[["url","not_like","%/wp-admin/%"],
"and",
[["reason","<>","meta_tag"],"or",["reason","<>","http_header"]]]

The full list of possible filters is available by this link.

‌‌‌‌‌‌
As a response of the API server, you will receive JSON-encoded data containing a tasks array with the information specific to the set tasks.

Description of the fields in the results array:

Field name Type Description
version string the current version of the API
status_code integer general status code
you can find the full list of the response codes here
Note: we strongly recommend designing a necessary system for handling related exceptional or error conditions
status_message string general informational message
you can find the full list of general informational messages here
time string execution time, seconds
cost float total tasks cost, USD
tasks_count integer the number of tasks in the tasks array
tasks_error integer the number of tasks in the tasks array returned with an error
tasks array array of tasks
        id string task identifier
unique task identifier in our system in the UUID format
        status_code integer status code of the task
generated by DataForSEO; can be within the following range: 10000-60000
you can find the full list of the response codes here
        status_message string informational message of the task
you can find the full list of general informational messages here
        time string execution time, seconds
        cost float cost of the task, USD
        result_count integer number of elements in the result array
        path array URL path
        data object contains the same parameters that you specified in the POST request
        result array array of results
            crawl_progress string status of the crawling session
possible values: in_progress, finished
            crawl_status object details of the crawling session
               max_crawl_pages integer maximum number of pages to crawl
indicates the max_crawl_pages limit you specified when setting a task
               pages_in_queue integer number of pages that are currently in the crawling queue
               pages_crawled integer number of crawled pages
            total_items_count integer total number of relevant items in the database
            items_count integer number of items in the results array
            items array items array
              reason string the reason why the page is non-indexable
can take the following values: robots_txt, meta_tag, http_header, attribute, too_many_redirects
              url string url of the non-indexable page

‌‌