๐Ÿ‘ถ ๋‚ด์ผ๋ฐฐ์›€๋‹จ/์›น๊ฐœ๋ฐœ ์ข…ํ•ฉ ๊ฐœ๋ฐœ์ผ์ง€

[์ŠคํŒŒ๋ฅดํƒ€์ฝ”๋”ฉํด๋Ÿฝ] ์›น๊ฐœ๋ฐœ ์ข…ํ•ฉ๋ฐ˜ - 3์ฃผ์ฐจ ๊ฐœ๋ฐœ์ผ์ง€ / ์ˆ™์ œ(์‹ค์Šต/ํ’€์ด)

  • -
 

๋ฒŒ์จ 3์ฃผ์ฐจ ๊ฐœ๋ฐœ์ผ์ง€..

 

๋นจ๋ฆฌ ๋ฐฐ์šฐ๊ณ  ์‹ถ์€ ์š•๊ตฌ์™€ ์—ด์ •์œผ๋กœ ์ง€๊ธˆ๊นŒ์ง€ ๋‹ฌ๋ ค์˜ฌ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

 

์ˆ™์ œ๋„ ํ€ด์ฆˆ๋„ ์ •๋ง์ •๋ง ์–ด๋ ต์ง€๋งŒ ๋ญ”๊ฐ€ ์ˆ˜๊ฐ•ํ•˜๋Š” ํ•™์ƒ๋“ค์˜ ์ž๋ฐœ์ ์œผ๋กœ ์œ ๋„ํ•˜๋Š” ๊ต์œก ๋‚ด์šฉ๋“ค์ด ๋‚˜์˜์ง€๋งŒ์€ ์•Š๋‹ค.

๊ฐœ์ธ์ ์œผ๋กœ ์•ผ์†ํ•  ๋ฟ.. (์–ต์šธํ•จ, ๋‹ต๋‹ตํ•จ, ๋ถ„๋…ธ ๋“ฑ)

 

๊ทธ๋ž˜๋„ ์‹œ๊ฐ„์ด ๊ฑธ๋ ค๋„ ์ด๋ค„๋‚ธ ๊ฒฐ๊ณผ๋ฅผ ๋ณผ ๋•Œ ์ •๋ง๋กœ ๊ฐ’์ง„ ์„ฑ์ทจ๊ฐ์€ ๋งค์šฐ ํ˜•์šฉํ•  ์ˆ˜๊ฐ€ ์—†๋‹ค.

 

์ด ์ƒํƒœ๋กœ ๋” ์ฆ์ง„ํ•˜๊ณ  ๋” ๋‚˜์•„๊ฐ€์ž

 


 

๋ฌธ์ œ) ์ง€๋‹ˆ๋ฎค์ง์˜ 1~50์œ„ ๊ณก์„ ์Šคํฌ๋ž˜ํ•‘ ํ•ด๋ณด์„ธ์š”.

์ˆœ์œ„ / ๊ณก ์ œ๋ชฉ / ๊ฐ€์ˆ˜๋ฅผ ์Šคํฌ๋ž˜ํ•‘ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

์ด๋ ‡๊ฒŒ ์ •๋ˆ๋˜๊ฒŒ ๋‚˜์˜จ๋‹ค๋ฉด ์„ฑ๊ณต!

 


 

์‹ค์Šต)

import requests
from bs4 import BeautifulSoup

from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client.dbsparta

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://www.genie.co.kr/chart/top200?ditc=D&ymd=20200403&hh=23&rtm=N&pg=1',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

 

์ „์— ๋ฐฐ์šด ๋‚ด์šฉ์˜ ์ฝ”๋“œ๋ฅผ ๊ฐ€์ง€๊ณ  ์ž…๋ ฅํ•  ์ค€๋น„๋ฅผ ํ•œ๋‹ค.

 

#body-content > div.newest-list > div > table > tbody > tr:nth-child(1) > td.number
#body-content > div.newest-list > div > table > tbody > tr:nth-child(1) > td.info > a.title.ellipsis
#body-content > div.newest-list > div > table > tbody > tr:nth-child(1) > td.info > a.artist.ellipsis
trs = soup.select('#body-content > div.newest-list > div > table > tbody > tr')

ํŠน์ • ๊ฐ€์ง€๊ณ  ์˜ค๊ณ ์ž ํ•˜๋Š” ์ฝ”๋“œ์˜ ๊ฒฝ๋กœ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ณ  ๊ณตํ†ต๋œ ๋ถ€๋ถ„๊นŒ์ง€ ์Šคํฌ๋žฉ

trs ๋กœ ์ง€์ •ํ•œ๋‹ค.

 

 

์Œ์•…์˜ ์ˆœ์œ„๋ฅผ ์Šคํฌ๋žฉ ํ•ด๋ณด์ž

for tr in trs:
    rank = tr.select_one('td.number').text
    print(rank)

์ฒซ ๋ฒˆ์งธ๋ถ€ํ„ฐ ๋‚œ๊ด€์— ๋ด‰์ฐฉ..

 

ํ•ด๋‹น ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๊ฒŒ ๋˜๋ฉด

์ด๋Ÿฐ ์‹์œผ๋กœ ์ •๋ฆฌ๋˜์ง€ ์•Š์€ ๋ถ€๋ถ„๋ณ„ํ•œ ๊ฒฐ๊ณผ๊ฐ’์ด ํ‘œ์‹œ๋œ๋‹ค. ๊ณต๋ฐฑ ํŒŒํ‹ฐ..

๋ฌธ์ œ๋Š” ์ด๋Ÿฌํ•˜๋‹ค.

ํ•ด๋‹น ์ˆœ์œ„์˜ ๊ฒฝ๋กœ

๋ถˆ๋Ÿฌ์˜ค๊ณ  ์‹ถ์€ ์ •๋ณด๋งŒ ํ‘œ์‹œ๋˜๊ฒŒ ํ•˜๊ณ  ์‹ถ์€๊ฑด๋ฐ, td.number ๋Š” ํ…์ŠคํŠธ 1๊ณผ span ์˜ ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋‹ค.

 

๊ตฌ๋ถ„์„ ์ฃผ์ง€ ์•Š๋Š”๋‹ค๋ฉด ์ €๋Ÿฐ ํ˜„์ƒ์€ ๊ณ„์† ๋  ๊ฒƒ์ด๋‹ค.

 

๋‹น์—ฐํžˆ ๋‹นํ™ฉ์Šค๋Ÿฌ์šด ์ด ๊ฒฐ๊ณผ๊ฐ’์€ ๋ฉ˜๋ถ•์— ๋น ๋œจ๋ฆฌ๊ฒŒ ํ•˜์˜€๊ณ , ๊ธฐ๊ป ํžŒํŠธ๋ฅผ ์–ป์–ด ์‚ฌ์šฉํ•œ ํ•จ์ˆ˜ strip() ํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์—์„œ๋Š” ์ „ํ˜€ ์‘์šฉํ•  ์ˆ˜ ์—†์—ˆ๋‹ค.

 

๊ทธ๋ž˜์„œ ์ด ๋ถ€๋ถ„๋งŒ ๋ช‡์‹ญ๋ถ„์„ ๋งค๋‹ฌ๋ ค ์žˆ์—ˆ๋Š”๋ฐ..

 

๋ฒˆ๋œฉ ๋– ์˜ค๋ฅธ split()

 

์ €๋ฒˆ์—๋„ ์ด๋Ÿฐ ๊ตฌ๋ถ„์ด ๋˜์–ด์žˆ์ง€ ์•Š์€ ๊ฒฝ๋กœ๋Š” ์ด๋Ÿฐ ์‹์œผ๋กœ ๋ถˆ๋Ÿฌ๋‚ด์™”๋˜ ๊ฒƒ์ด ์ƒ๊ฐ๋‚ฌ๋‹ค.

 

๋ฐ”๋กœ ์‹คํ–‰!!

 

for tr in trs:
    rank = tr.select_one('td.number').text.split()
    print(rank[0])

ํšจ๊ณผ๋Š” ๊ต‰์žฅํ–ˆ๋‹ค.

์ •๋ง ๋ฐฐ์šด๊ฑธ ์จ๋จน๋Š”๋‹ค๋Š” ์ด๋Ÿฐ๊ฑฐ๊ตฌ๋‚˜๋ฅผ ๋งํ•˜๋Š” ๊ฑฐ์ผ ๊ฒƒ์ด๋‹ค.

 

ํ™•์—ฐํžˆ ๋‹ค๋ฅธ ๊ฒฐ๊ณผ๊ฐ’์— ๋งŒ์„ธ๋ฅผ ์™ธ์น˜๋ฉฐ ๋‹ค์Œ ์ •๋ณด๋ฅผ ๋ถˆ๋Ÿฌ์˜จ๋‹ค.

 

 

์Œ์› ํƒ€์ดํ‹€, ์ฆ‰ ์ œ๋ชฉ์„ ๋ถˆ๋Ÿฌ์™€๋ณด์ž.

for tr in trs:
    title = tr.select_one('td.info > a.title').text

 

์œ„์—๋Š” ์ข€ ์–ด๋ ค์› ์œผ๋‹ˆ ์ด๋ฒˆ๊ป€ ์‰ฝ๊ฒ ์ง€ ํ–ˆ๋˜ ๋‚˜์˜ ์•ˆ์ผํ•œ ์ƒ๊ฐ์„ ํ™•์‹คํ•˜๊ฒŒ ๋ถ€์…”์ฃผ์—ˆ๋‹ค.

์–ด์šฐ.. ์—ฌ๋ฐฑ์˜ ๋ฏธ ๊ทธ๋Ÿฐ๊ฑด๊ฐ€..

 

๋ฐ”๋กœ ํžŒํŠธ๋ฅผ ์—ฌ๊ธฐ์— ์ ์šฉ์‹œ์ผœ๋ณด์•˜๋‹ค.

 

for tr in trs:
    title = tr.select_one('td.info > a.title').text.strip()
    print(title)

.strip() ์€ ๊ฐ€์ง€๊ณ  ์˜ค๊ณ ์ž ํ•˜๋Š” ์ •๋ณด์˜ ์–‘ ์˜†์˜ ๊ณต๋ฐฑ ๋ฐ ์ž…๋ ฅํ•œ ๋‚ด์šฉ์„ ์ƒ๋žตํ•ด์ฃผ๋Š” ์—ญํ• ์„ ํ•œ๋‹ค๊ณ  ํ•œ๋‹ค.

 

๊ทธ ๋งŽ๋˜ ๊ณต๋ฐฑ์ด ์‚ฌ๋ผ์ง„ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜๊ฐ€ ์žˆ๋‹ค.

 

ํŽธ ใ…ก ์•ˆ

 

์ œ๋ชฉ๊นŒ์ง€๋„ ์„ฑ๊ณต์ !! ๋‹ค์Œ ๊ฐ€์ˆ˜๋ช…๊นŒ์ง€!!

 

for tr in trs:
    name = tr.select_one('td.info > a.artist').text

์ด ์นดํ…Œ๊ณ ๋ฆฌ๋Š” ๊ธฐํŠนํ•˜๊ฒŒ๋„ ๋‹จ์ˆœํ•˜๊ฒŒ ๋„˜์–ด๊ฐˆ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

 

๊ณต๋ฐฑ๋„ ์—†์–ด~ ๋„ˆ๋ฌด ์ข‹์•„~

 

์ด์ œ ๋ณ€์ˆ˜ ์ง€์ •ํ•œ 3๊ฐ€์ง€๋ฅผ ํ•œ๋ฒˆ์— ํ‘œ์‹œํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์ž.

 

for tr in trs:
    rank = tr.select_one('td.number').text.split()
    title = tr.select_one('td.info > a.title').text.strip()
    name = tr.select_one('td.info > a.artist').text

    print(rank[0], title, name)

์•„์ฃผ ๋Œ€๋งŒ์กฑ!!

 

๋ฌธ์ œ ์˜ˆ์‹œ์™€ ๋™์ผํ•˜๊ฒŒ ํ‘œ์‹œ๊ฐ€ ๋˜์—ˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 

 


 

ํ•ด๋‹ต์ฝ”๋“œ์™€ ๋น„๊ต)

 

์ง์ ‘ ์ฝ”๋”ฉ)

import requests
from bs4 import BeautifulSoup

from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client.dbsparta

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://www.genie.co.kr/chart/top200?ditc=D&ymd=20200403&hh=23&rtm=N&pg=1',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

trs = soup.select('#body-content > div.newest-list > div > table > tbody > tr')

for tr in trs:
    rank = tr.select_one('td.number').text.split()
    title = tr.select_one('td.info > a.title').text.strip()
    name = tr.select_one('td.info > a.artist').text

    print(rank[0], title, name)

 

ํ•ด์„ค)

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://www.genie.co.kr/chart/top200?ditc=D&ymd=20200403&hh=23&rtm=N&pg=1',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

trs = soup.select('#body-content > div.newest-list > div > table > tbody > tr')

for tr in trs:
    title = tr.select_one('td.info > a.title.ellipsis').text.strip()
    rank = tr.select_one('td.number').text[0:2].strip()
    artist = tr.select_one('td.info > a.artist.ellipsis').text
    print(rank, title, artist)

 

๋น„๊ตํ•ด ๋ดค์„ ๋•Œ ๋‚˜์˜ ์ฝ”๋“œ์— ํ™•์‹คํ•œ ๊ฒฝ๋กœ ์„ค์ •๊ณผ ํ•ด์„ค ์ฝ”๋“œ์—์„œ๋Š” split() ์ด ๋ณด์ด์ง€ ์•Š์•˜๋‹ค

 

.text[0:2] ๊ฐ€ ๊ฐ™์€ ์—ญํ• ์„ ํ•˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

 

 


 

 

๊ฐ™์€ ๊ฒฐ๊ณผ ๊ฐ’ ์†์—๋„ ์—ญ์‹œ ๋‹ค๋ฅธ ์ฝ”๋“œ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

์•ž์œผ๋กœ๋„ ์ด๋Ÿฐ ๋ถ€๋ถ„์„ ๊ฒฝํ—˜ํ•˜๊ณ  ์ฝ”๋“œ๋ฅผ ๋น„๊ตํ•˜๋ฉด์„œ ์–ด๋–ค ์ฝ”๋”ฉ์„ ํ•ด์•ผ ๋” ๋ณด๊ธฐ ์‰ฝ๊ณ  ๊ฐ„ํŽธํ•˜๊ณ  ํšจ์œจ์ ์œผ๋กœ ํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์•ˆ๋ชฉ์„ ๋„“ํž ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

 

๊ฐ„๋‹จํ•œ ๋ฌธ์ œ๋ผ ์—ฌ๊ธฐ๊ณ  ํ•  ์ˆœ ์—†์—ˆ์ง€๋งŒ, ๊ทธ๋ž˜๋„ ์ด๋ ‡๊ฒŒ ํ•˜๋‚˜ ํ•˜๋‚˜ ํ•ด๊ฒฐํ•ด ๋‚˜์•„๊ฐ€๋ฉด์„œ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด ๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ๊ธฐ๋ฅด๋Š” ๊ฒƒ.

 

์ •๋ง ๊ฐ’์ง„ ๊ฒฝํ—˜์ด๊ณ  ๊พธ์ค€ํžŒ ๋…ธ๋ ฅ์ด ํ•„์š”ํ•  ๊ฒƒ ๊ฐ™๋‹ค.

 

๋” ์ •์ง„ํ•ด์„œ ๋‚˜์•„๊ฐ€์ž.

Contents

ํฌ์ŠคํŒ… ์ฃผ์†Œ๋ฅผ ๋ณต์‚ฌํ–ˆ์Šต๋‹ˆ๋‹ค

์ด ๊ธ€์ด ๋„์›€์ด ๋˜์—ˆ๋‹ค๋ฉด ๊ณต๊ฐ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.