This article example describes the python implementation of reading web pages and split words function. Shared for your reference, as follows:
Here the use of participles uses the most popular participle package, jieba, ref:/fxsjy/jieba
Or click here for this siteDownload jieba library。
Code:
import requests from bs4 import BeautifulSoup import jieba # Get html url = "/a/20180328/16049779_0.shtml" res = (url) = 'utf-8' content = # Add to bs4 soup = BeautifulSoup(content, '') div = (id = 'main_content') # Write the file filename = '' with open(filename,'w',encoding='utf-8') as file_object: # <p> tag handling for line in (): file_object.write(line.get_text()+'\n') # Use of word-splitting tools seg_list = ("I came to Tsinghua University in Beijing.", cut_all=True) print("Full Mode: " + "/ ".join(seg_list)) # Full mode seg_list = ("I came to Tsinghua University in Beijing.", cut_all=False) print("Default Mode: " + "/ ".join(seg_list)) # Precision mode seg_list = ("He came to the NetEase HangYan Building.") # The default is precise mode print(", ".join(seg_list)) with open(filename,'r',encoding='utf-8') as file_object: with open('cut_news.txt','w',encoding='utf-8') as file_cut_object: for line in file_object.readlines(): seg_list = (line,cut_all=False) file_cut_object.write('/'.join(seg_list))
Crawl results:
Segmentation results:
Readers interested in more Python related content can check out this site's topic: theSummary of Python Math Operations Tips》、《Python Data Structures and Algorithms Tutorial》、《Summary of Python function usage tips》、《Summary of Python string manipulation techniques》、《Python introductory and advanced classic tutorialsand theSummary of Python file and directory manipulation techniques》
I hope that what I have said in this article will help you in Python programming.