OnPage API Non-indexable Pages
This endpoint returns a list of pages that are blocked from being indexed by Google and other search engines through robots.txt, HTTP headers, or meta tags settings.
This endpoint returns a list of pages that are blocked from being indexed by Google and other search engines through robots.txt, HTTP headers, or meta tags settings.
Instead of ‘login’ and ‘password’ use your credentials from https://app.dataforseo.com/api-access
<?php
// You can download this file from here https://cdn.dataforseo.com/v3/examples/php/php_RestClient.zip
require('RestClient.php');
$api_url = 'https://api.dataforseo.com/';
// Instead of 'login' and 'password' use your credentials from https://app.dataforseo.com/api-access
$client = new RestClient($api_url, null, 'login', 'password');
$post_array = array();
// simple way to get a result
$post_array[] = array(
"id" => "07281559-0695-0216-0000-c269be8b7592",
"filters" => [
["reason", "=", "robots_txt"],
"and",
["url", "like", "%go%"]
],
"limit" => 10
);
try {
// POST /v3/on_page/non_indexable
// the full list of possible parameters is available in documentation
$result = $client->post('/v3/on_page/non_indexable', $post_array);
print_r($result);
// do something with post result
} catch (RestClientException $e) {
echo "\n";
print "HTTP code: {$e->getHttpCode()}\n";
print "Error code: {$e->getCode()}\n";
print "Message: {$e->getMessage()}\n";
print $e->getTraceAsString();
echo "\n";
}
$client = null;
?>
The above command returns JSON structured like this:
{
"version": "0.1.20200805",
"status_code": 20000,
"status_message": "Ok.",
"time": "0.1075 sec.",
"cost": 0,
"tasks_count": 1,
"tasks_error": 0,
"tasks": [
{
"id": "07281559-0695-0216-0000-c269be8b7592",
"status_code": 20000,
"status_message": "Ok.",
"time": "0.0236 sec.",
"cost": 0,
"result_count": 1,
"path": [
"v3",
"on_page",
"non_indexable"
],
"data": {
"api": "on_page",
"function": "non_indexable"
},
"result": [
{
"crawl_progress": "finished",
"crawl_status": {
"max_crawl_pages": 10,
"pages_in_queue": 0,
"pages_crawled": 10
},
"total_items_count": 3,
"items_count": 2,
"items": [
{
"reason": "robots_txt",
"url": "https://dataforseo.com/go/"
},
{
"reason": "robots_txt",
"url": "https://dataforseo.com/wp-admin/"
}
]
}
]
}
]
}
All POST data should be sent in the JSON format (UTF-8 encoding). The task setting is done using the POST method. When setting a task, you should send all task parameters in the task array of the generic POST array.
Description of the fields for setting a task:
| Field name | Type | Description |
|---|---|---|
id |
string | ID of the task required field you can get this ID in the response of the Task POST endpoint example: “07131248-1535-0216-1000-17384017ad04” |
limit |
integer | the maximum number of returned pages optional field default value: 100maximum value: 1000 |
offset |
integer | offset in the results array of returned pages optional field default value: 0if you specify the 10 value, the first ten pages in the results array will be omitted and the data will be provided for the successive pages |
filters |
array | array of results filtering parameters optional field you can add several filters at once (8 filters maximum) you should set a logical operator and, or between the conditionsthe following operators are supported: regex, not_regex, <, <=, >, >=, =, <>, in, not_in, like, not_likeyou can use the % operator with like and not_like to match any string of zero or more charactersexample: ["reason","=","robots_txt"][["reason","<>","robots_txt"],
The full list of possible filters is available by this link. |
As a response of the API server, you will receive JSON-encoded data containing a tasks array with the information specific to the set tasks.
Description of the fields in the results array:
| Field name | Type | Description |
|---|---|---|
version |
string | the current version of the API |
status_code |
integer | general status code you can find the full list of the response codes here Note: we strongly recommend designing a necessary system for handling related exceptional or error conditions |
status_message |
string | general informational message you can find the full list of general informational messages here |
time |
string | execution time, seconds |
cost |
float | total tasks cost, USD |
tasks_count |
integer | the number of tasks in the tasks array |
tasks_error |
integer | the number of tasks in the tasks array returned with an error |
tasks |
array | array of tasks |
id |
string | task identifier unique task identifier in our system in the UUID format |
status_code |
integer | status code of the task generated by DataForSEO; can be within the following range: 10000-60000 you can find the full list of the response codes here |
status_message |
string | informational message of the task you can find the full list of general informational messages here |
time |
string | execution time, seconds |
cost |
float | cost of the task, USD |
result_count |
integer | number of elements in the result array |
path |
array | URL path |
data |
object | contains the same parameters that you specified in the POST request |
result |
array | array of results |
crawl_progress |
string | status of the crawling session possible values: in_progress, finished |
crawl_status |
object | details of the crawling session |
max_crawl_pages |
integer | maximum number of pages to crawl indicates the max_crawl_pages limit you specified when setting a task |
pages_in_queue |
integer | number of pages that are currently in the crawling queue |
pages_crawled |
integer | number of crawled pages |
total_items_count |
integer | total number of relevant items in the database |
items_count |
integer | number of items in the results array |
items |
array | items array |
reason |
string | the reason why the page is non-indexable can take the following values: robots_txt, meta_tag, http_header, attribute, too_many_redirects |
url |
string | url of the non-indexable page |