2017年6月13日 星期二

Python邊學邊記錄-Crawler網路爬蟲-實戰-虎航2_selenium

利用selenium來處理的話,就不需要再去看data帶了什麼資料了!
selenium本身會直接操作browser,所以頁面上的欄位也要直接的給值,而不是去透過data給值了!

首先,先設定好webdriver!
chrome_path = 'D:\pyCrawler\selenium_driver_chrome\chromedriver.exe'driver = webdriver.Chrome(chrome_path)
driver.maximize_window()
driver.set_page_load_timeout(60)
driver.get(TigerUrl)

然後,就要開始找畫面上的欄位定位了!

來回:應該是不用去調整才對!
起發:ControlGroupSearchView_AvailabilitySearchInputSearchVieworiginStation1
抵達:ControlGroupSearchView_AvailabilitySearchInputSearchViewdestinationStation1
去程日:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketDay1
去程年月:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketMonth1
回程日:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketDay2
回程年月:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketMonth2
成人:ControlGroupSearchView$AvailabilitySearchInputSearchView$DropDownListPassengerType_ADT
兒童:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListPassengerType_CHD
嬰兒:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListPassengerType_INFANT
獲取航班:ControlGroupSearchView_ButtonSubmit

element = WebDriverWait(driver, 10, 0.5).until(EC.presence_of_element_located((By.ID,
 'ControlGroupSearchView_ButtonSubmit')))
#  下拉選單需要拆段作業,先定位點擊,然後再巡覽選項#  出發機場el = Select(driver.find_element_by_id(
'ControlGroupSearchView_AvailabilitySearchInputSearchVieworiginStation1'))
el.select_by_value('TPE')
#  抵達機場el = Select(driver.find_element_by_id(
'ControlGroupSearchView_AvailabilitySearchInputSearchViewdestinationStation1'))
el.select_by_value('DMK')
#  去程日el = Select(driver.find_element_by_id(
'ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketDay1'))
el.select_by_value('21')
#  去程年月el = Select(driver.find_element_by_id(
'ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketMonth1'))
el.select_by_value('2017-06')
#  回程日el = Select(driver.find_element_by_id(
'ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketDay2'))
el.select_by_value('30')
#  回程年月el = Select(driver.find_element_by_id(
'ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketMonth2'))
el.select_by_value('2017-06')
#  成人數el = Select(driver.find_element_by_id(
'ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListPassengerType_ADT'))
el.select_by_value('3')
#  兒童數el = Select(driver.find_element_by_id(
'ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListPassengerType_CHD'))
el.select_by_value('0')
#  嬰兒數el = Select(driver.find_element_by_id(
'ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListPassengerType_INFANT'))
el.select_by_value('0')
#  按下獲取航班driver.find_element_by_id(
'ControlGroupSearchView_ButtonSubmit').click()
#  透過等待的設定來待網頁,看到需求的元件就往下執行了!WebDriverWait(driver, 10, 0.5).until(EC.presence_of_element_located((By.ID, 'flightSpinner')))

條件的部份後續可以再優化,透過文字檔來處理!
不過基本上這樣子已經可以取得資料了!
# 透過page_source回傳網頁文件給BeautifulSoup處理
soup = BeautifulSoup(driver.page_source, 'html5lib')
tbs = soup.select('.select-flight')

start_lightPrice = soup.select('#tableMarket1 > tr > td')[1].select('label > span')
print('sl:', start_lightPrice)
start_comboPrice = soup.select('#tableMarket1 > tr > td')[2].select('label > span')
print('sc:', start_comboPrice)
end_lightPrice = soup.select('#tableMarket2 > tr > td')[1].select('label > span')
print('el:', end_lightPrice)
end_comboPrice = soup.select('#tableMarket2 > tr > td')[2].select('label > span')
print('ec:', end_comboPrice)

BeautifulSoup也有CSS選擇器,不過在td的部份,一直想透過td.light price來做定位一直無法成功,如果有路過的前輩知道也請指導!

再來處理一下,就可以弄成給預算還有旅遊區間,然後讓程式自動去爬,一但有了就自動發出mail來通知你有便宜的機票了!

不過現在一堆機票搜尋都寫那麼好了....=..=





沒有留言:

張貼留言