selenium本身會直接操作browser,所以頁面上的欄位也要直接的給值,而不是去透過data給值了!
首先,先設定好webdriver!
chrome_path = 'D:\pyCrawler\selenium_driver_chrome\chromedriver.exe'driver = webdriver.Chrome(chrome_path) driver.maximize_window() driver.set_page_load_timeout(60) driver.get(TigerUrl)
然後,就要開始找畫面上的欄位定位了!
來回:應該是不用去調整才對!
起發:ControlGroupSearchView_AvailabilitySearchInputSearchVieworiginStation1
抵達:ControlGroupSearchView_AvailabilitySearchInputSearchViewdestinationStation1
去程日:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketDay1
去程年月:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketMonth1
回程日:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketDay2
回程年月:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketMonth2
成人:ControlGroupSearchView$AvailabilitySearchInputSearchView$DropDownListPassengerType_ADT
兒童:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListPassengerType_CHD
嬰兒:ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListPassengerType_INFANT
獲取航班:ControlGroupSearchView_ButtonSubmit
element = WebDriverWait(driver, 10, 0.5).until(EC.presence_of_element_located((By.ID,
'ControlGroupSearchView_ButtonSubmit')))
# 下拉選單需要拆段作業,先定位點擊,然後再巡覽選項# 出發機場el = Select(driver.find_element_by_id('ControlGroupSearchView_AvailabilitySearchInputSearchVieworiginStation1')) el.select_by_value('TPE') # 抵達機場el = Select(driver.find_element_by_id('ControlGroupSearchView_AvailabilitySearchInputSearchViewdestinationStation1')) el.select_by_value('DMK') # 去程日el = Select(driver.find_element_by_id('ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketDay1')) el.select_by_value('21') # 去程年月el = Select(driver.find_element_by_id('ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketMonth1')) el.select_by_value('2017-06') # 回程日el = Select(driver.find_element_by_id('ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketDay2')) el.select_by_value('30') # 回程年月el = Select(driver.find_element_by_id('ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListMarketMonth2')) el.select_by_value('2017-06') # 成人數el = Select(driver.find_element_by_id('ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListPassengerType_ADT')) el.select_by_value('3') # 兒童數el = Select(driver.find_element_by_id('ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListPassengerType_CHD')) el.select_by_value('0') # 嬰兒數el = Select(driver.find_element_by_id('ControlGroupSearchView_AvailabilitySearchInputSearchView_DropDownListPassengerType_INFANT')) el.select_by_value('0') # 按下獲取航班driver.find_element_by_id('ControlGroupSearchView_ButtonSubmit').click() # 透過等待的設定來待網頁,看到需求的元件就往下執行了!WebDriverWait(driver, 10, 0.5).until(EC.presence_of_element_located((By.ID, 'flightSpinner')))
條件的部份後續可以再優化,透過文字檔來處理!
不過基本上這樣子已經可以取得資料了!
# 透過page_source回傳網頁文件給BeautifulSoup處理
不過基本上這樣子已經可以取得資料了!
# 透過page_source回傳網頁文件給BeautifulSoup處理
soup = BeautifulSoup(driver.page_source, 'html5lib') tbs = soup.select('.select-flight') start_lightPrice = soup.select('#tableMarket1 > tr > td')[1].select('label > span') print('sl:', start_lightPrice) start_comboPrice = soup.select('#tableMarket1 > tr > td')[2].select('label > span') print('sc:', start_comboPrice) end_lightPrice = soup.select('#tableMarket2 > tr > td')[1].select('label > span') print('el:', end_lightPrice) end_comboPrice = soup.select('#tableMarket2 > tr > td')[2].select('label > span') print('ec:', end_comboPrice)
BeautifulSoup也有CSS選擇器,不過在td的部份,一直想透過td.light price來做定位一直無法成功,如果有路過的前輩知道也請指導!
再來處理一下,就可以弄成給預算還有旅遊區間,然後讓程式自動去爬,一但有了就自動發出mail來通知你有便宜的機票了!
不過現在一堆機票搜尋都寫那麼好了....=..=
沒有留言:
張貼留言