OnPage API Non-indexable Pages
This endpoint returns a list of pages that are blocked from being indexed by Google and other search engines by robots.txt
, HTTP headers, or meta tags settings.
This endpoint returns a list of pages that are blocked from being indexed by Google and other search engines by robots.txt
, HTTP headers, or meta tags settings.
Instead of ‘login’ and ‘password’ use your credentials from https://app.dataforseo.com/api-access
<?php // You can download this file from here https://cdn.dataforseo.com/v3/examples/php/php_RestClient.zip require('RestClient.php'); $api_url = 'https://api.dataforseo.com/'; // Instead of 'login' and 'password' use your credentials from https://app.dataforseo.com/api-access $client = new RestClient($api_url, null, 'login', 'password'); $post_array = array(); // simple way to get a result $post_array[] = array( "id" => "07281559-0695-0216-0000-c269be8b7592", "filters" => [ ["reason", "=", "robots_txt"], "and", ["url", "like", "%go%"] ], "limit" => 10 ); try { // POST /v3/on_page/non_indexable // the full list of possible parameters is available in documentation $result = $client->post('/v3/on_page/non_indexable', $post_array); print_r($result); // do something with post result } catch (RestClientException $e) { echo "\n"; print "HTTP code: {$e->getHttpCode()}\n"; print "Error code: {$e->getCode()}\n"; print "Message: {$e->getMessage()}\n"; print $e->getTraceAsString(); echo "\n"; } $client = null; ?>
The above command returns JSON structured like this:
{ "version": "0.1.20200805", "status_code": 20000, "status_message": "Ok.", "time": "0.1075 sec.", "cost": 0, "tasks_count": 1, "tasks_error": 0, "tasks": [ { "id": "07281559-0695-0216-0000-c269be8b7592", "status_code": 20000, "status_message": "Ok.", "time": "0.0236 sec.", "cost": 0, "result_count": 1, "path": [ "v3", "on_page", "non_indexable" ], "data": { "api": "on_page", "function": "non_indexable" }, "result": [ { "crawl_progress": "finished", "crawl_status": { "max_crawl_pages": 10, "pages_in_queue": 0, "pages_crawled": 10 }, "total_items_count": 3, "items_count": 2, "items": [ { "reason": "robots_txt", "url": "https://dataforseo.com/go/" }, { "reason": "robots_txt", "url": "https://dataforseo.com/wp-admin/" } ] } ] } ] }
All POST data should be sent in the JSON format (UTF-8 encoding). The task setting is done using the POST method. When setting a task, you should send all task parameters in the task array of the generic POST array.
Description of the fields for setting a task:
Field name | Type | Description |
---|---|---|
id |
string | ID of the task required field you can get this ID in the response of the Task POST endpoint example: “07131248-1535-0216-1000-17384017ad04” |
limit |
integer | the maximum number of returned pages optional field default value: 100 maximum value: 1000 |
offset |
integer | offset in the results array of returned pages optional field default value: 0 if you specify the 10 value, the first ten pages in the results array will be omitted and the data will be provided for the successive pages |
filters |
array | array of results filtering parameters optional field you can add several filters at once (8 filters maximum) you should set a logical operator and , or between the conditionsthe following operators are supported: regex , not_regex , < , <= , > , >= , = , <> , in , not_in , like , not_like you can use the % operator with like and not_like to match any string of zero or more charactersexample: ["reason","=","robots_txt"] [["reason","<>","robots_txt"],
The full list of possible filters is available by this link. |
As a response of the API server, you will receive JSON-encoded data containing a tasks
array with the information specific to the set tasks.
Description of the fields in the results array:
Field name | Type | Description |
---|---|---|
version |
string | the current version of the API |
status_code |
integer | general status code you can find the full list of the response codes here Note: we strongly recommend designing a necessary system for handling related exceptional or error conditions |
status_message |
string | general informational message you can find the full list of general informational messages here |
time |
string | execution time, seconds |
cost |
float | total tasks cost, USD |
tasks_count |
integer | the number of tasks in the tasks array |
tasks_error |
integer | the number of tasks in the tasks array returned with an error |
tasks |
array | array of tasks |
id |
string | task identifier unique task identifier in our system in the UUID format |
status_code |
integer | status code of the task generated by DataForSEO; can be within the following range: 10000-60000 you can find the full list of the response codes here |
status_message |
string | informational message of the task you can find the full list of general informational messages here |
time |
string | execution time, seconds |
cost |
float | cost of the task, USD |
result_count |
integer | number of elements in the result array |
path |
array | URL path |
data |
object | contains the same parameters that you specified in the POST request |
result |
array | array of results |
crawl_progress |
string | status of the crawling session possible values: in_progress , finished |
crawl_status |
object | details of the crawling session |
max_crawl_pages |
integer | maximum number of pages to crawl indicates the max_crawl_pages limit you specified when setting a task |
pages_in_queue |
integer | number of pages that are currently in the crawling queue |
pages_crawled |
integer | number of crawled pages |
total_items_count |
integer | total number of relevant items in the database |
items_count |
integer | number of items in the results array |
items |
array | items array |
reason |
string | the reason why the page is non-indexable can take the following values: robots_txt , meta_tag , http_header , attribute , too_many_redirects |
url |
string | url of the non-indexable page |