๐Ÿ—๏ธ์†Œํ”„ํŠธ์›จ์–ด/๐Ÿ’ปpython

    ํ•œ๊ตญ์–ด NLP ์ˆ˜ํ–‰ํ•˜๊ธฐ์ „ ๊ณผ์ •

    ํ•œ๊ตญ์–ด NLP ์ˆ˜ํ–‰ํ•˜๊ธฐ์ „ ๊ณผ์ •

    1. tweepy # tweepy ๋ฒ„์ „ ๋‚ฎ์ถ”๊ธฐ # !pip install tweepy==3.10.0 # import tweepy # tweepy.__version__ 4๋ฒ„์ „ ๋ถ€ํ„ฐ # konlpy ํŒจํ‚ค์ง€๊ฐ€ tweepy ํŒจํ‚ค์ง€์— ํฌํ•จ๋œ StreamListener ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  # ์žˆ๋Š”๋ฐ ์ด๊ฒƒ์„ ๋ถˆ๋Ÿฌ์˜ค๋Š”๋ฐ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒ # StreamListener ํด๋ž˜์Šค๊ฐ€ #Tweepy ๋ฒ„์ „ 4๋ถ€ํ„ฐ Stream ์ด๋ž€ ํด๋ž˜์Šค๋กœ ํ†ตํ•ฉ ์ฆ‰, 3๋ฒ„์ „์„ ๋‹ค์šด๋ฐ›์•ผ์•ผํ•œ๋‹ค. ๊ธฐ์กด์— ์“ฐ๋˜ ๊ฑฐ๋Š” 4.4.0 ๋ฒ„์ „์ด๋‹ค. Successfully installed PySocks-1.7.1 tweepy-3.10.0 '4.4.0' 2. ์œˆ๋„์šฐ Mecab ์„ค์น˜ ๋ฐฉ๋ฒ• 1) C๋“œ๋ผ์ด๋ธŒ์— Mecab ํด๋” ์ƒ์„ฑ 2) https://github.com/Pusn..

    If using all scalar values, you must pass an index ์—๋Ÿฌ, ํ•ด๊ฒฐ

    If using all scalar values, you must pass an index ์—๋Ÿฌ, ํ•ด๊ฒฐ

    in getData(driver) 249 # print(f'{row} : title/view/date') 250 # df ๋กœ ๋งŒ๋“ค์–ด๋ฒ„๋ฆฌ๊ธฐ --> 251 data_df = pd.DataFrame({ 252 'title' : title, 253 'views' : view, ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋‚ด๋ถ€์— ๋ฆฌ์ŠคํŠธ ๊ฐ’์œผ๋กœ ๋ณ€๊ฒฝํ•ด์ฃผ๋ฉด๋œ๋‹ค. ๊ธฐ์กด์˜ ์ฝ”๋“œ # row = [title, view, date] # print(f'{row} : title/view/date') # df ๋กœ ๋งŒ๋“ค์–ด๋ฒ„๋ฆฌ๊ธฐ data_df = pd.DataFrame({ 'title' : title, 'views' : view, 'upload_dates' : date }) print(data_df) return data_df ํ•ด๊ฒฐํ–ˆ๋‹ค.

    VScode ์—์„œ Jupyter notebook์— ํ•œ๊ตญ์–ด ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ KoNLy ์™€ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ McCab ์„ธํŒ…ํ•˜๊ธฐ(์„ค์น˜)

    VScode ์—์„œ Jupyter notebook์— ํ•œ๊ตญ์–ด ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ KoNLy ์™€ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ McCab ์„ธํŒ…ํ•˜๊ธฐ(์„ค์น˜)

    ================= OS : window 10 64bit ๊ฐœ๋ฐœ์–ธ์–ด : python 3.7.0 python ๋นŒ๋“œ ๋˜์–ด ์ปค๋งจ๋“œ ์ฐฝ์—์„œ pip install ์‹คํ–‰ ๊ฐ€๋Šฅ cpu : AMD Ryzen 7 3700X 8-Core Processor 3.59 GHz ๋žจ : 24.0GB ๊ทธ๋ž˜ํ”ฝ ์นด๋“œ(gpu) : GeForce RTX 2060 SUPER ================= 1. ์†Œ๊ฐœ KoNLPy KoNLPy(์ฝ”์—”์—˜ํŒŒ์ด)๋Š” ํ•œ๊ตญ์–ด ์ •๋ณด์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ํŒŒ์ด์ฌ ํŒจํ‚ค์ง€ ๋‹ค์–‘ํ•œ API(ํด๋ž˜์Šค)๋“ค์ด ์กด์žฌ ( โ€ป KoNLPy์˜ Mecab() ํด๋ž˜์Šค๋Š” ์œˆ๋„์šฐ์—์„œ ์ง€์› X) 2. ํ™˜๊ฒฝ ์„ธํŒ… ์ž๋ฐ”์™€ JPype๋ฅผ ์„ค์น˜ํ•ด์•ผ ํ•œ๊ตญ์–ด ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ KoNLPy ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค Java 1.7+ ์„ค์น˜ JAVA_HOME ํ™˜๊ฒฝ๋ณ€์ˆ˜ ..

    1 columns passed, passed data had 44 columns ๋ฆฌ์ŠคํŠธ ๊ด€๋ จ ์—๋Ÿฌ

    1 columns passed, passed data had 44 columns ๋ฆฌ์ŠคํŠธ ๊ด€๋ จ ์—๋Ÿฌ

    ํฌ๋กค๋งํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ ๋ฐ˜ํ™˜ํ•ด์„œ ๋ฆฌ์ŠคํŠธ๋กœ ์ด์–ด๋ถ™์ด๊ณ  ์žˆ์—ˆ๋Š”๋ฐ, ์ž๊พธ๋งŒ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ˜•์„ฑํ•ด์„œ ์‹คํŒจ ์˜ค๋ฅ˜๊ฐ€ ์ƒ๊ฒผ๋‹ค. ์•Œ๊ณ ๋ณด๋‹ˆ ์ด์ค‘ ๋ฆฌ์ŠคํŠธ๊ฐ€ ํ˜•์„ฑ๋˜์–ด์„œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ํ˜•์„ฑํ•˜์ง€ ๋ชปํ•˜๊ณ  ์žˆ์—ˆ๋˜ ๊ฒƒ์ด์—ˆ๋‹ค. ์ฆ‰, ์ด์ค‘ ๋ฆฌ์ŠคํŠธ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์–ด๋ถ™์ด์ง€๋ง๊ณ , ๋ฆฌ์ŠคํŠธ ํ˜•ํƒœ๋กœ ์ƒ์„ฑํ•œ ํ›„์— for๋ฌธ์—์„œ ๋ฐ”๋กœ df์œผ๋กœ ๋งŒ๋“ค์–ด์„œ ํ•˜๋‹จ์œผ๋กœ ๋Œ“๊ธ€ ๋ฐ์ดํ„ฐ๋“ค์„ ์ญ‰ ์ด์–ด์„œ ๋ถ™์ด๋Š” ํ˜•ํƒœ๋กœ ๋งŒ๋“ค์–ด์•ผํ•จ์„ ์•Œ์•˜๋‹ค. def getCom(driver): #์ œ๋ชฉ, ์กฐํšŒ์ˆ˜, ๋Œ“๊ธ€ ๊ฐ€์ ธ์˜ค๊ธฐ html = driver.page_source soup = BeautifulSoup(html,'html.parser') comments = [] try: # ํ‚ค์›Œ๋“œ, ํ•ด๋‹น ๋Œ“๊ธ€ ์ „์ฒด ํฌ๋กค๋ง 5 - table 2 comm = soup.select('div#cont..

    nltk pos_tag ์ข…๋ฅ˜

    nltk pos_tag ์ข…๋ฅ˜

    ๋ฌธ์žฅ์„ word_tokenize (ํ† ํฐํ™”)-> pos_tag(ํ’ˆ์‚ฌ ๋ถ„๋ฅ˜) ๋ฅผ ํ†ตํ•ด pos(ํ’ˆ์‚ฌ)๋ฅผ ์ฐพ๊ณ , ๋ฌธ์žฅ ๋‚ด์—์„œ ๋‹จ์–ด์— ํ•ด๋‹นํ•˜๋Š” ํ’ˆ์‚ฌ๋ฅผ ํƒœ๊น…ํ•˜์—ฌ ์•Œ์ˆ˜ ์žˆ๋‹ค. ์ข…๋ฅ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. import nltk nltk.download('punkt') from nltk import word_tokenize words = word_tokenize("Think like man of action and act like man of thought") words # ๋จผ์ € ๋ฌธ์žฅ์„ ํ† ํฐํ™”ํ•œ ํ›„์— ํฌ์Šค ํƒœ๊ทธ๋ฅผ ๋‹ฌ๋ฉด ๋œ๋‹ค [nltk_data] Downloading package punkt to [nltk_data] C:\Users\AppData\Roaming\nltk_data... [nltk_data] Package punk..

    python ๊ฐ€์ƒํ™˜๊ฒฝ ์„ค์น˜์ค‘์— vscode ์—๋Ÿฌ : Kernel process Exited

    python ๊ฐ€์ƒํ™˜๊ฒฝ ์„ค์น˜์ค‘์— vscode ์—๋Ÿฌ : Kernel process Exited

    from keras.preprocessing.text import text_to_word_sequence sentence = 'Where there\'s a will, there\'s a way' text_to_word_sequence(sentence) ํ•ด๋‹น ์ฝ”๋“œ๋ฅผ 3.8.8 ํŒŒ์ด์ฌ ๋ฒ„์ „์—์„œ ์‚ฌ์šฉํ•˜๋‹ค๊ฐ€, 3.7 ๋ฒ„์ „์˜ ์ฟ ๋‹ค๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋ฒ„์ „์„ ๋‚ฎ์ถฐ์„œ ์‹คํ–‰ํ–ˆ๋”๋‹ˆ ๊ฐ€์ƒํ™˜๊ฒฝ์— ์ฃผํ”ผํ„ฐ ๋…ธํŠธ๋ถ์ด ์—†์–ด์„œ ์ƒˆ๋กญ๊ฒŒ ๊น”์•„์•ผํ•œ๋‹ค๊ณ  ํ–ˆ๋Š”๋ฐ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•˜์˜€๋‹ค. ํ•ด๊ฒฐ๋ฐฉ๋ฒ• ํ•ด๊ฒฐ๋ฐฉ๋ฒ• conda install ipykernel --update-deps --force-reinstall ์ด๋ž€ ์ฝ”๋“œ๋ฅผ ํ„ฐ๋ฏธ๋„์—์„œ ์‹คํ–‰์‹œ์ผœ์ค€๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ๋‹ค์Œ์ฝ”๋“œ๊ฐ€ ๋œฌ๋‹ค. ## Package Plan ## environment location..

    RNN ๊ตฌ์กฐ ์ตํžˆ๊ธฐ

    1. ๋ช‡ ๊ฐœ์˜ ๋‹จ์–ด๋ฅผ ํ†ตํ•ด์„œ RNN ๊ตฌ์กฐ๋ฅผ ์ตํ˜€๋ณด์ž¶ In [1]: import numpy as np import pandas as pd import matplotlib.pyplot as plt ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•ํ•˜๊ธฐ¶ ๊ธ€์ž ํ•˜๋‚˜ํ•˜๋‚˜ ๋‹จ์œ„๋กœ RNN ์‚ฌ์šฉํ•ด๋ณด๊ธฐ hello, apple, hobby, daddy, bobby In [2]: # ๋ฌธ์ œ ๋ฐ์ดํ„ฐ : hell , appl, hobb, dadd, bobb # ์ด 4๋ฒˆ ์ˆœํ™˜ํ•˜๋Š” ๋‹จ๊ณ„๋กœ ์ง„ํ–‰๋จ # ์ •๋‹ต ๋ฐ์ดํ„ฐ : o, e, y, y # timestamps = 4 ์› ํ•ซ ์ธ์ฝ”๋”ฉ์œผ๋กœ ๋ฌธ์ž๋ฅผ ์ˆซ์ž๋กœ ๋ณ€๊ฒฝ¶ ๋ฌธ์ œ + ์ •๋‹ต ์ „์ฒด ๋ฐ์ดํ„ฐ์—์„œ ๋“ฑ์žฅํ•˜๋Š” ๋ฌธ์ œ๋Š” h,e,l,o,a,p,b,y,d ์ด 9๊ฐœ In [26]: a = ['hello', 'apple&#..

    vscode์—์„œ rtx 2060super ์จ๋จน์–ด๋ณด๊ธฐ (feat.gpu cuda๋ฅผ ์ข€์จ๋ณด์ž)

    vscode์—์„œ rtx 2060super ์จ๋จน์–ด๋ณด๊ธฐ (feat.gpu cuda๋ฅผ ์ข€์จ๋ณด์ž)

    ์—ฌ๋‹ด ๋‚œ.. ๊ทธ๋™์•ˆ..gpu๋ฅผ ํ—›์“ฐ๊ณ  ์žˆ์—ˆ๋‹ค. cpu๋งŒ ์“ฐ๊ณ  ์žˆ์—ˆ๋‹ค๋‹ˆ. ์ข‹์€ ์ปดํ“จํ„ฐ๋ฅผ ๋ƒ…๋‘๊ณ  ์™œ ํ™œ์šฉ์„ ์ž˜ ๋ชปํ•˜๊ณ  ์žˆ์„๊นŒ... ํ•œํƒ„์Šค๋Ÿฝ๋‹ค. ๊ทธ๋ž˜๋„ ์ง€๊ธˆ์ด๋ผ๋„ ์•Œ์•˜์œผ๋‹ˆ ์–ผ๋งˆ๋‚˜ ๋‹คํ–‰์ด๋žด. ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฒƒ์€ ๊ธ€ ๋งˆ์ง€๋ง‰ ํ•˜๋‹จ์— ์„ค์น˜๋ฒ„์ „ ์ฐธ์กฐํ• ๊ฒƒ! ํ™•๋Œ€ํ•ด์„œ ๋ณด๋ฉด, True๋ผ๊ณ  ๋‚˜์˜ค๋ฉด์„œ ์ง€์›์ด ๋˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด์ง€๋งŒ, ์‹ค์ƒ์€ ์•„๋‹ˆ๋‹ค. ์‚ฌ์šฉํ•œ ๋ช…๋ น์–ด ๋ชฉ๋ก import tensorflow as tf tf.__version__ # ์„ค์น˜๋œ tensorflow ๋ฒ„์ „ ํ™•์ธ tf.test.is_built_with_cuda() # ์ฟ ๋‹ค๋กœ ๋นŒ๋“œ๋˜์—ˆ๋Š”์ง€ ํ™•์ธ tf.test.is_built_with_gpu_support() # cuda์™€ ๊ฐ™์€ gpu๋กœ ๋นŒ๋“œ๋˜์—ˆ๋Š”์ง€ ํ™•์ธ tf.test.gpu_device_name() # ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ gp..

    crawling parrel processing

    crawling parrel processing

    this is completed with just one virtual env. Not taking too much cpu memories

    [๋ฐฑ์ค€]10798. ์„ธ๋กœ์“ฐ๊ธฐ

    ์ด ๋‹ค์„ฏ์ค„์˜ ์ž…๋ ฅ์ด ์ฃผ์–ด์ง„๋‹ค. ๊ฐ ์ค„์—๋Š” ์ตœ์†Œ 1๊ฐœ, ์ตœ๋Œ€ 15๊ฐœ์˜ ๊ธ€์ž๋“ค์ด ๋นˆ์นธ ์—†์ด ์—ฐ์†์œผ๋กœ ์ฃผ์–ด์ง„๋‹ค. ์ฃผ์–ด์ง€๋Š” ๊ธ€์ž๋Š” ์˜์–ด ๋Œ€๋ฌธ์ž ‘A’๋ถ€ํ„ฐ ‘Z’, ์˜์–ด ์†Œ๋ฌธ์ž ‘a’๋ถ€ํ„ฐ ‘z’, ์ˆซ์ž ‘0’๋ถ€ํ„ฐ ‘9’ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๊ฐ ์ค„์˜ ์‹œ์ž‘๊ณผ ๋งˆ์ง€๋ง‰์— ๋นˆ์นธ์€ ์—†๋‹ค. In [1]: # ์„ธ๋กœ ์ฝ๊ธฐ space = [] for row in range(5): row = [0] * 15 space.append(row) space Out[1]: [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, ..