SoFunction
Updated on 2024-11-12

Example of a python implementation that reads a web page and breaks down the words.

This article example describes the python implementation of reading web pages and split words function. Shared for your reference, as follows:

Here the use of participles uses the most popular participle package, jieba, ref:/fxsjy/jieba

Or click here for this siteDownload jieba library

Code:

import requests
from bs4 import BeautifulSoup
import jieba
# Get html
url = "/a/20180328/16049779_0.shtml"
res = (url)
 = 'utf-8'
content = 
# Add to bs4
soup = BeautifulSoup(content, '')
div = (id = 'main_content')
# Write the file
filename = ''
with open(filename,'w',encoding='utf-8') as file_object:
  # <p> tag handling
  for line in ():
    file_object.write(line.get_text()+'\n')
# Use of word-splitting tools
seg_list = ("I came to Tsinghua University in Beijing.", cut_all=True)
print("Full Mode: " + "/ ".join(seg_list)) # Full mode
seg_list = ("I came to Tsinghua University in Beijing.", cut_all=False)
print("Default Mode: " + "/ ".join(seg_list)) # Precision mode
seg_list = ("He came to the NetEase HangYan Building.") # The default is precise mode
print(", ".join(seg_list))
with open(filename,'r',encoding='utf-8') as file_object:
  with open('cut_news.txt','w',encoding='utf-8') as file_cut_object:
    for line in file_object.readlines():
      seg_list = (line,cut_all=False)
      file_cut_object.write('/'.join(seg_list))

Crawl results:

Segmentation results:

Readers interested in more Python related content can check out this site's topic: theSummary of Python Math Operations Tips》、《Python Data Structures and Algorithms Tutorial》、《Summary of Python function usage tips》、《Summary of Python string manipulation techniques》、《Python introductory and advanced classic tutorialsand theSummary of Python file and directory manipulation techniques

I hope that what I have said in this article will help you in Python programming.