blackbinbin 370ab8163c add doc for crawl | %!s(int64=6) %!d(string=hai) anos | |
---|---|---|
.. | ||
README.md | %!s(int64=6) %!d(string=hai) anos |
#run
用 supervisor 守护进程来启动node爬虫进程
vim /data/services/supervisor.conf
举个栗子:
[program:node1]
command=node /data/webapps/test.spider.duowan.com/protected/index.js
process_name=WEB_test.spider.duowan.com
directory=/data/webapps/test.spider.duowan.com/protected/
numprocs=1
autostart=true
autorestart=true
stdout_logfile=/tmp/WEB_test.spider.duowan.com.log
设置进程后,查看:
sudo supervisorctl
主要启动的脚本为 index.js checkProxyPool.js crawlMaster.js crawlWorker.js