Difference with
What is generated is a generator, generator, which means that each word in it can be fetched by a for loop.
word_list= [word for word in (text)]
What is generated directly is a list
Prefix dict has been built succesfully. Full Model me/next/Beijing, capital of People's */go to school ['I', 'Come on', 'Beijing', 'Going to school']
Several participle interfaces for jieba: cut, lcut,,
cut
cut provides the most basic word segmentation functionality, and returns a generator, which can be accessed by iterating over the individual words.
lcut
The difference between lcut and the cut method is that lcut returns a list. it can also be equated to () by list(())
-
prosseg
methodologies
The difference between posseg and the same is similar, except that posseg also provides lexical properties, which facilitates syntactic analysis.
s = 'We're all little frogs, we croak, we croak, we like to be happy, and we tell jokes.' import jieba (s) # <generator object cut at 0x10a6e5500> list((s)) # [u'\u6211\u4eec', u'\u90fd', u'\u662f',...] (s) # [u'\u6211\u4eec', u'\u90fd', u'\u662f',...] import (s) # <generator object cut at 0x10cc80eb0> list((s)) # [pair(u'\u6211\u4eec', u'r'), pair(u'\u90fd', u'd')...] (s) # [pair(u'\u6211\u4eec', u'r'), pair(u'\u90fd', u'd')...]
summarize
The above is a personal experience, I hope it can give you a reference, and I hope you can support me more.