Golang Colly Fake Headers Integration
The following are two examples of how to integrate the Fake Browser Headers API and the Fake User-Agent API into your Go Colly based web scrapers.
Go Colly Fake Browser Headers API Integration
To integrate the Fake Browser Headers API you should configure your scraper to retrieve a batch of the most up-to-date headers when the scraper starts and then configure your scraper to pick a random header from this list for each request.
Here is an example Go Colly scraper integration:
package main
import (
"bytes"
"time"
"log"
"math/rand"
"net/http"
"encoding/json"
"github.com/gocolly/colly"
)
type FakeUserAgentResponse struct {
Result []string `json:"result"`
}
func RandomString(userAgentList []string) string {
randomIndex := rand.Intn(len(userAgentList))
return userAgentList[randomIndex]
}
func GetUserAgentList() []string {
// ScrapeOps User-Agent API Endpint
scrapeopsAPIKey := "YOUR_API_KEY"
scrapeopsAPIEndpoint := "http://headers.scrapeops.io/v1/user-agents?api_key=" + scrapeopsAPIKey
req, _ := http.NewRequest("GET", scrapeopsAPIEndpoint, nil)
client := &http.Client{
Timeout: 10 * time.Second,
}
// Make Request
resp, err := client.Do(req)
if err == nil && resp.StatusCode == 200 {
defer resp.Body.Close()
// Convert Body To JSON
var fakeUserAgentResponse FakeUserAgentResponse
json.NewDecoder(resp.Body).Decode(&fakeUserAgentResponse)
return fakeUserAgentResponse.Result
}
var emptySlice []string
return emptySlice
}
func main() {
// Instantiate default collector
c := colly.NewCollector(colly.AllowURLRevisit())
// Get Fake User Agents From API
userAgentList := GetUserAgentList()
// Set Random Fake User Agent
c.OnRequest(func(r *colly.Request) {
r.Headers.Set("User-Agent", RandomString(userAgentList))
})
// Print the Response
c.OnResponse(func(r *colly.Response) {
log.Printf("%s\n", bytes.Replace(r.Body, []byte("\n"), nil, -1))
})
// Fetch httpbin.org/ip five times
for i := 0; i < 5; i++ {
c.Visit("http://httpbin.org/headers")
}
}
Go Colly Fake User-Agent API Integration
To integrate the Fake User-Agent API you should configure your scraper to retrieve a batch of the most up-to-date user-agents when the scraper starts and then configure your scraper to pick a random user-agent from this list for each request.
Here is an example Request-Promise scraper integration:
package main
import (
"bytes"
"time"
"log"
"math/rand"
"net/http"
"encoding/json"
"github.com/gocolly/colly"
)
type FakeBrowserHeadersResponse struct {
Result []map[string]string `json:"result"`
}
func RandomHeader(headersList []map[string]string) map[string]string {
randomIndex := rand.Intn(len(headersList))
return headersList[randomIndex]
}
func GetHeadersList() []map[string]string {
// ScrapeOps Browser Headers API Endpint
scrapeopsAPIKey := "YOUR_API_KEY"
scrapeopsAPIEndpoint := "http://headers.scrapeops.io/v1/browser-headers?api_key=" + scrapeopsAPIKey
req, _ := http.NewRequest("GET", scrapeopsAPIEndpoint, nil)
client := &http.Client{
Timeout: 10 * time.Second,
}
// Make Request
resp, err := client.Do(req)
if err == nil && resp.StatusCode == 200 {
defer resp.Body.Close()
// Convert Body To JSON
var fakeBrowserHeadersResponse FakeBrowserHeadersResponse
json.NewDecoder(resp.Body).Decode(&fakeBrowserHeadersResponse)
return fakeBrowserHeadersResponse.Result
}
var emptySlice []map[string]string
return emptySlice
}
func main() {
// Instantiate default collector
c := colly.NewCollector(colly.AllowURLRevisit())
// Get Fake Browser Headers From API
headersList := GetHeadersList()
// Set Random Fake Browser Headers
c.OnRequest(func(r *colly.Request) {
randomHeader := RandomHeader(headersList)
for key, value := range randomHeader {
r.Headers.Set(key, value)
}
})
// Print the Response
c.OnResponse(func(r *colly.Response) {
log.Printf("%s\n", bytes.Replace(r.Body, []byte("\n"), nil, -1))
})
// Fetch httpbin.org/ip five times
for i := 0; i < 5; i++ {
c.Visit("http://httpbin.org/headers")
}
}
Here the scraper will use a random user-agent for each request.
API Parameters
The following is a list of API parameters that you can include with your requests to customise the header list response.
Parameter | Description |
---|---|
api_key | This is a required parameter. You can get your Free API key here. |
num_results | By default the API returns a list of 10 user-agents, however, you can increase that number by changing the num_results number. Max is 100 headers. |