脚本可批量爬取FOFA的搜索结果,基于FOFA的一个不是公共的API,因为这个API并不是专门供用户使用的,基本和爬网页差不多,毕竟爬网页不需要高级会员,只要不爬太多,问题不大的。
脚本环境:
python2
使用方法:
修改脚本中的查询条件的base64字符串部分(第12行)、需要爬取的链接还是IP还是HOST,正则中的改为ip、link、host(第53行),另外脚本同目录下要放一个config.txt,里面填写自己的fofa_token。
脚本代码:
import sys
defaultencoding = 'utf-8'
if sys.getdefaultencoding() != defaultencoding:
reload(sys)
sys.setdefaultencoding(defaultencoding)
import requests
from lxml import etree
import re
qbase64 = "YXBwPSJXVVpISUNNUyI%3D"
config = open('config.txt','r')
cookie_config=config.readline().strip()
header = {
'Authorization':cookie_config
}
def request(url):
try:
text = requests.get(url,headers=header).text
return text
except requests.exceptions.ConnectTimeout as a:
print(a)
except requests.exceptions.ProxyError as b:
print(b)
except requests.exceptions.ConnectTimeout as c:
print(c)
except requests.exceptions.ConnectionError as d:
print(d)
def pn_count(url):
text = request(url)
total_number = re.findall('"total":(\d*)',text)
total_number=int(total_number[0])
if (total_number % 10):
pn = total_number/10 + 1
else:
pn = total_number/10
return pn
def spider():
current_url = "https://api.fofa.so/v1/search?qbase64=" + qbase64
pn = pn_count(current_url)
print("spider website is :"+current_url)
print("The results are {} pages in total".format(pn))
stop_page = raw_input("please input stop page: \n")
doc = open("result.txt", "w+")
for i in range(1,100000):
print("Now write " + str(i) + " page")
pageurl = requests.get('https://api.fofa.so/v1/search?pn=' + str(i) + '&qbase64=' + qbase64,headers=header)
urllist = re.findall('"link":"(.*?)"', pageurl.text)
try:
for j in urllist:
doc.write(j + "\n")
except:
print("error!!")
if i == long(stop_page):
break
doc.close()
print("OK,Spider is End .")
def main():
spider()
if __name__ == '__main__':
main()
脚本没有做的特别完善,只是一个自用的小工具,用到的时候稍微改一点就可以了,就没有加解析参数的部分,毕竟爬人家网页爬太多不太合适。
PS:如果白帽汇的师傅们觉得不太好的话,联系我,我把文章和脚本删了~