OnPage API Non-indexable Pages

‌‌
This endpoint returns a list of pages that are blocked from being indexed by Google and other search engines through robots.txt, HTTP headers, or meta tags settings.

checked POST
Pricing

Your account will not be charged for using this function. You can get the results of the task within the next 30 days for free.
The cost can be calculated on the Pricing page.

All POST data should be sent in the JSON format (UTF-8 encoding). The task setting is done using the POST method. When setting a task, you should send all task parameters in the task array of the generic POST array.

Description of the fields for setting a task:

Field name Type Description
id string ID of the task
required field
you can get this ID in the response of the Task POST endpoint
example:
“07131248-1535-0216-1000-17384017ad04”
limit integer the maximum number of returned pages
optional field
default value: 100
maximum value: 1000
offset integer offset in the results array of returned pages
optional field
default value: 0
if you specify the 10 value, the first ten pages in the results array will be omitted and the data will be provided for the successive pages
filters array array of results filtering parameters
optional field
you can add several filters at once (8 filters maximum)
you should set a logical operator and, or between the conditions
the following operators are supported:
regex, not_regex, <, <=, >, >=, =, <>, in, not_in, like, not_like
you can use the % operator with like and not_like to match any string of zero or more characters
example:
["reason","=","robots_txt"][["reason","<>","robots_txt"],
"and",
["url","not_like","%/wp-admin/%"]]

[["url","not_like","%/wp-admin/%"],
"and",
[["reason","<>","meta_tag"],"or",["reason","<>","http_header"]]]

The full list of possible filters is available by this link.

‌‌‌‌‌‌
As a response of the API server, you will receive JSON-encoded data containing a tasks array with the information specific to the set tasks.

Description of the fields in the results array:

Field name Type Description
version string the current version of the API
status_code integer general status code
you can find the full list of the response codes here
Note: we strongly recommend designing a necessary system for handling related exceptional or error conditions
status_message string general informational message
you can find the full list of general informational messages here
time string execution time, seconds
cost float total tasks cost, USD
tasks_count integer the number of tasks in the tasks array
tasks_error integer the number of tasks in the tasks array returned with an error
tasks array array of tasks
        id string task identifier
unique task identifier in our system in the UUID format
        status_code integer status code of the task
generated by DataForSEO; can be within the following range: 10000-60000
you can find the full list of the response codes here
        status_message string informational message of the task
you can find the full list of general informational messages here
        time string execution time, seconds
        cost float cost of the task, USD
        result_count integer number of elements in the result array
        path array URL path
        data object contains the same parameters that you specified in the POST request
        result array array of results
            crawl_progress string status of the crawling session
possible values: in_progress, finished
            crawl_status object details of the crawling session
               max_crawl_pages integer maximum number of pages to crawl
indicates the max_crawl_pages limit you specified when setting a task
               pages_in_queue integer number of pages that are currently in the crawling queue
               pages_crawled integer number of crawled pages
            total_items_count integer total number of relevant items in the database
            items_count integer number of items in the results array
            items array items array
              reason string the reason why the page is non-indexable
can take the following values: robots_txt, meta_tag, http_header, attribute, too_many_redirects
              url string url of the non-indexable page

‌‌

Instead of ‘login’ and ‘password’ use your credentials from https://app.dataforseo.com/api-access

# Instead of 'login' and 'password' use your credentials from https://app.dataforseo.com/api-access 
login="login" 
password="password" 
cred="$(printf ${login}:${password} | base64)" 
curl --location --request POST "https://api.dataforseo.com/v3/on_page/non_indexable" 
--header "Authorization: Basic ${cred}"  
--header "Content-Type: application/json" 
--data-raw "[
  {
    "id": "07281559-0695-0216-0000-c269be8b7592",
    "filters": [
      ["reason", "=", "robots_txt"],
      "and",
      ["url", "like", "%go%"]
    ],
    "limit": 10
  }
]"
<?php
// You can download this file from here https://cdn.dataforseo.com/v3/examples/php/php_RestClient.zip
require('RestClient.php');
$api_url = 'https://api.dataforseo.com/';
// Instead of 'login' and 'password' use your credentials from https://app.dataforseo.com/api-access
$client = new RestClient($api_url, null, 'login', 'password');

$post_array = array();
// simple way to get a result
$post_array[] = array(
   "id" => "07281559-0695-0216-0000-c269be8b7592",
   "filters" => [
      ["reason", "=", "robots_txt"],
      "and",
      ["url", "like", "%go%"]
   ],
   "limit" => 10
);
try {
   // POST /v3/on_page/non_indexable
   // the full list of possible parameters is available in documentation
   $result = $client->post('/v3/on_page/non_indexable', $post_array);
   print_r($result);
   // do something with post result
} catch (RestClientException $e) {
   echo "n";
   print "HTTP code: {$e->getHttpCode()}n";
   print "Error code: {$e->getCode()}n";
   print "Message: {$e->getMessage()}n";
   print  $e->getTraceAsString();
   echo "n";
}
$client = null;
?>
const post_array = [];

post_array.push({
  "id": "07281559-0695-0216-0000-c269be8b7592",
  "filters": [
    ["reason", "=", "robots_txt"],
    "and",
    ["url", "like", "%go%"]
  ],
  "limit": 10
});

const axios = require('axios');

axios({
  method: 'post',
  url: 'https://api.dataforseo.com/v3/on_page/non_indexable',
  auth: {
    username: 'login',
    password: 'password'
  },
  data: post_array,
  headers: {
    'content-type': 'application/json'
  }
}).then(function (response) {
  var result = response['data']['tasks'];
  // Result data
  console.log(result);
}).catch(function (error) {
  console.log(error);
});
from random import Random
from client import RestClient
# You can download this file from here https://api.dataforseo.com/v3/_examples/python/_python_Client.zip
client = RestClient("login", "password")

post_data = dict()
# simple way to get a result
post_data[len(post_data)] = dict(
    id="07281559-0695-0216-0000-c269be8b7592",
    filters=[
        ["reason", "=", "robots_txt"],
        "and", 
        ["url", "like", "%go%"]
    ],
    limit=10
)
# POST /v3/on_page/non_indexable
# the full list of possible parameters is available in documentation
response = client.post("/v3/on_page/non_indexable", post_data)
# you can find the full list of the response codes here https://docs.dataforseo.com/v3/appendix/errors
if response["status_code"] == 20000:
    print(response)
    # do something with result
else:
    print("error. Code: %d Message: %s" % (response["status_code"], response["status_message"]))
using Newtonsoft.Json;
using System;
using System.Collections.Generic;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;

namespace DataForSeoDemos
{
    public static partial class Demos
    {
        public static async Task on_page_non_indexable()
        {
            var httpClient = new HttpClient
            {
                BaseAddress = new Uri("https://api.dataforseo.com/"),
                // Instead of 'login' and 'password' use your credentials from https://app.dataforseo.com/api-access
                DefaultRequestHeaders = { Authorization = new AuthenticationHeaderValue("Basic", Convert.ToBase64String(Encoding.ASCII.GetBytes("login:password"))) }                
            };
            var postData = new List<object>();
            // simple way to get a result
            postData.Add(new
            {
                id = "07281559-0695-0216-0000-c269be8b7592",
                filters = new object[]
                {
                    new object[] { "reason", "=", "robots_txt" },
                    "and",
                    new object[] { "url", "like", "%go%" }
                },
                limit = 10
            });
            // POST /v3/on_page/non_indexable
            // the full list of possible parameters is available in documentation
            var taskPostResponse = await httpClient.PostAsync("/v3/on_page/non_indexable", new StringContent(JsonConvert.SerializeObject(postData)));
            var result = JsonConvert.DeserializeObject<dynamic>(await taskPostResponse.Content.ReadAsStringAsync());
            // you can find the full list of the response codes here https://docs.dataforseo.com/v3/appendix/errors
            if (result.status_code == 20000)
            {
                // do something with result
                Console.WriteLine(result);
            }
            else
                Console.WriteLine($"error. Code: {result.status_code} Message: {result.status_message}");
        }
    }
}

The above command returns JSON structured like this:

{
  "version": "0.1.20200805",
  "status_code": 20000,
  "status_message": "Ok.",
  "time": "0.1075 sec.",
  "cost": 0,
  "tasks_count": 1,
  "tasks_error": 0,
  "tasks": [
    {
      "id": "07281559-0695-0216-0000-c269be8b7592",
      "status_code": 20000,
      "status_message": "Ok.",
      "time": "0.0236 sec.",
      "cost": 0,
      "result_count": 1,
      "path": [
        "v3",
        "on_page",
        "non_indexable"
      ],
      "data": {
        "api": "on_page",
        "function": "non_indexable"
      },
      "result": [
        {
          "crawl_progress": "finished",
          "crawl_status": {
            "max_crawl_pages": 10,
            "pages_in_queue": 0,
            "pages_crawled": 10
          },
          "total_items_count": 3,
          "items_count": 2,
          "items": [
            {
              "reason": "robots_txt",
              "url": "https://dataforseo.com/go/"
            },
            {
              "reason": "robots_txt",
              "url": "https://dataforseo.com/wp-admin/"
            }
          ]
        }
      ]
    }
  ]
}