SoFunction
Updated on 2024-11-19

Differences and clarifications regarding the

Difference with

What is generated is a generator, generator, which means that each word in it can be fetched by a for loop.

word_list= [word for word in (text)]

What is generated directly is a list

Prefix dict has been built succesfully.
Full Model me/next/Beijing, capital of People's */go to school
['I', 'Come on', 'Beijing', 'Going to school']

Several participle interfaces for jieba: cut, lcut,,

  • cut

cut provides the most basic word segmentation functionality, and returns a generator, which can be accessed by iterating over the individual words.

  • lcut

The difference between lcut and the cut method is that lcut returns a list. it can also be equated to () by list(())

  • prossegmethodologies

The difference between posseg and the same is similar, except that posseg also provides lexical properties, which facilitates syntactic analysis.

s = 'We're all little frogs, we croak, we croak, we like to be happy, and we tell jokes.'
import jieba
(s)  # <generator object cut at 0x10a6e5500>
list((s))  # [u'\u6211\u4eec', u'\u90fd', u'\u662f',...]
(s)  # [u'\u6211\u4eec', u'\u90fd', u'\u662f',...]
import 
(s)  # <generator object cut at 0x10cc80eb0>
list((s))  # [pair(u'\u6211\u4eec', u'r'), pair(u'\u90fd', u'd')...]
(s)  # [pair(u'\u6211\u4eec', u'r'), pair(u'\u90fd', u'd')...]

summarize

The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.