๐Ÿ‘ถ ๋‚ด์ผ๋ฐฐ์›€๋‹จ/์›น๊ฐœ๋ฐœ ์ข…ํ•ฉ ๊ฐœ๋ฐœ์ผ์ง€

3์ฃผ์ฐจ_Quiz_์›น์Šคํฌ๋ž˜ํ•‘(ํฌ๋กค๋ง) ํ’€์ด ๋˜์ƒˆ๊น€

  • -

 

 

์„ ์ƒ๋‹˜ : "๋˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค"

 

๋‚˜ : ๊ทธ๋‹ˆ๊นŒ.. ๋ฐฉ๋ฒ•์„ ์ข€ ๋” ๋””ํ…Œ์ผํ•˜๊ฒŒ.. ์•Œ๋ ค์ฃผ์‹œ๋ฉด ์•ˆ๋ ๊นŒ์š”...?

 

 

 

[๋‹ค ์ดํ•ดํ•˜๋Š” ๊ฒƒ๋„ ์ข‹์ง€๋งŒ, ์ตœ๋Œ€ํ•œ ์ด๋ ‡๊ฒŒ ํ˜๋Ÿฌ๊ฐ€๋Š”๊ตฌ๋‚˜ ๋ผ๊ณ  ๋ณด๋ฉฐ ์ต์ˆ™ํ•ด์ง€์ž!!]


 

์›น์Šคํฌ๋ž˜ํ•‘ (๋„ค์ด๋ฒ„ ์˜ํ™” ํŽ˜์ด์ง€์—์„œ ์ˆœ์œ„, ์ œ๋ชฉ, ๋ณ„์ ๋งŒ ๋”ฐ๋กœ ์Šคํฌ๋žฉ ํ•ด๋ณด์ž)

 

import ํ•  bs4๋ฅผ ์„ค์น˜

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.nhn?sel=pnt&date=20200303',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

 

 

๋ฒ”์œ„ ์ง€์ •

๊ฒฝ๋กœ๋ฅผ ๊ฐ€์ ธ์˜ฌ ๋•Œ๋Š” ๊ฒ€์‚ฌ๋กœ ๋“ค์–ด๊ฐ€์„œ copy > copyselector ํด๋ฆญ์œผ๋กœ ๋ณต์‚ฌํ•œ๋‹ค.

# old_content > table > tbody > tr:nth-child(2) > td.title > div > a  #๊ทธ๋ฆฐ ๋ถ
# old_content > table > tbody > tr:nth-child(3) > td.title > div > a  #๊ฐ€๋ฒ„๋‚˜์›€
# old_content > table > tbody > tr:nth-child(4) > td.title > div > a  #๋ฒ ์ผ๋ฆฌ ์–ด๊ฒŒ์ธ
# old_content > table > tbody > tr:nth-child(5) > td.title > div > a  #์ฃผ์ „์žฅ

:nth-child(2) > td.title > div > a ์ด ๋ถ€๋ถ„ ๊นŒ์ง€๋Š” ๊ฐ ๊ฐœ๋ณ„ ๊ฒฝ๋กœ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ œ์™ธ

# old_content > table > tbody > tr

๊ทธ ๋‹ค์Œ trs = soup.select('#old_content > table > tbody > tr') ์ง€์ •

  ๋‚ด๊ฐ€ ์Šคํฌ๋žฉ ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฒฝ๋กœ๋ฅผ ๊ฒ€์‚ฌ๋ฅผ ํ†ตํ•ด ์ฐพ๊ณ  ๊ณตํ†ต๋œ ๋ฒ”์œ„๊นŒ์ง€ ์ง€์ • ๊ธฐ์ž…

trs = soup.select('#old_content > table > tbody > tr')

 

๊ทธ๋ฆฐ ๋ถ : ์ œ๋ชฉ ์˜ ๊ฒฝ๋กœ ์ฐพ๊ธฐ

# old_content > table > tbody > tr:nth-child(2) > td.title > div > a

์œ„์˜ '๊ทธ๋ฆฐ ๋ถ'์ด ์œ„์น˜ํ•œ ๊ฒฝ๋กœ = td.title > div > a

 

for ๋ฅผ ์‚ฌ์šฉํ•ด์„œ a_tag ์— ์ด๋ฆ„์ด ๋“ค์–ด๊ฐˆ ์ž๋ฆฌ ์ง€์ •

.select_one ์€ ํŠน์ • ๋ถ€๋ถ„๋งŒ ์„ ํƒ

for tr in trs:
    a_tag = tr.select_one('td.title > div > a')

 

์œ„์˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰์‹œํ‚ค๋ฉด

์ค‘๊ฐ„์— None ์ด ๋“ค์–ด๊ฐ€ ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

์ด None ์„ ์ƒ๋žตํ•˜๊ธฐ ์œ„ํ—ค if ๋ฅผ ์จ์ฃผ๊ณ  

for tr in trs:
    a_tag = tr.select_one('td.title > div > a')
    if a_tag is not None:

 

๋ฐ‘์œผ๋กœ ์ˆœ์œ„, ์ œ๋ชฉ, ๋ณ„์  ์ง€์ •

 

์ˆœ์œ„ = rank

for tr in trs:
    a_tag = tr.select_one('td.title > div > a')
    if a_tag is not None:
        rank = tr.select_one('td:nth-child(1) > img')['alt']

์—ฌ๊ธฐ์„œ ์ˆœ์œ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” alt ๋ถ€๋ถ„๋งŒ ๊ฐ€์ ธ์˜ค๊ธฐ ( '' ๋ˆ„๋ฝ ์ฃผ์˜ )

 

์ œ๋ชฉ = title

.text ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ '๊ทธ๋ฆฐ ๋ถ' ๋งŒ ์ถ”์ถœ

for tr in trs:
    a_tag = tr.select_one('td.title > div > a')
    if a_tag is not None:
        title = a_tag.text

 

๋ณ„์  = star

for tr in trs:
    a_tag = tr.select_one('td.title > div > a')
    if a_tag is not None:
        star = tr.select_one('td.point').text

 

 

์ •๋ฆฌํ•˜๋ฉด

for tr in trs:
    a_tag = tr.select_one('td.title > div > a')
    if a_tag is not None:
        rank = tr.select_one('td:nth-child(1) > img')[alt]
        title = a_tag.text
        star = tr.select_one('td.point').text

        print(rank, title, star)

 

์‹คํ–‰ํ•˜๋ฉด

์ •์ƒ์ ์œผ๋กœ ์ˆœ์œ„, ์ œ๋ชฉ, ๋ณ„์ ์ด ์ถœ๋ ฅ

 

 


 

 

์ ์  ๋‚œ์ด๋„๊ฐ€ ์–ด๋ ค์›Œ์ง€๋ฉด์„œ ๋ฐฐ์›€์˜ ์†๋„๋„ ๋”๋ŽŒ์ง€๊ณ  ์žˆ์Œ์„ ๋Š๋‚€๋‹ค.

 

๋˜, ์—ฐ์Šต ๋ฌธ์ œ์ž„์—๋„ ์ฆ‰๊ฐ ๋ฐ˜์‘ํ•˜์ง€ ๋ชปํ•˜๊ณ  ์ด๋ฒˆ ํ€ด์ฆˆ ๋ฌธ์ œ๊ฐ™์ด ํ•จ๊ป˜ ํ’€์–ด์•ผ ํ•  ๋•Œ๊ฐ€ ๋งŽ๋‹ค..

 

๊ทธ๋ž˜๋„ ์กฐ๊ธˆ์”ฉ..

 

๊ป์งˆ ๋ฒ—๊ณ  ์‹น์ด ํŠธ๋Š” ์ƒˆ์‹น์ฒ˜๋Ÿผ

 

์•„์ง ์ œ๋Œ€๋กœ ํ™œ์ง ํ”ผ์–ด๋ณด์ง€ ์•Š์•˜์œผ๋‹ˆ๊นŒ

 

ํฌ๊ธฐํ•˜์ง€ ๋ง๊ณ , ๋” ํ•ด๋ณด์ž!!

Contents

ํฌ์ŠคํŒ… ์ฃผ์†Œ๋ฅผ ๋ณต์‚ฌํ–ˆ์Šต๋‹ˆ๋‹ค

์ด ๊ธ€์ด ๋„์›€์ด ๋˜์—ˆ๋‹ค๋ฉด ๊ณต๊ฐ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.