头部背景图

python快速提取英文单词的方法总结

:soogor 2022-09-03 14:54:41 :35
我们在日常的数据处理当中,有时需要提取文中的单词,在这里我们总结了几个提取英文单词的方法。方法一:for迭代法;方法二:split()方法

我们在日常的数据处理当中,有时需要提取文中的单词,在这里我们总结了几个提取英文单词的方法。

方法一:

for迭代法

text = '''CHAPTER I   A SERMON ON INNS
The sea was a pale elfin green and the afternoon had already felt the fairy touch of evening as a young woman with dark hair, dressed in a crinkly copper-coloured sort of dress of the artistic order, was walking rather listlessly along the parade of Pebblewick-on-Sea, trailing a parasol and looking out upon the sea’s horizon. She had a reason for looking instinctively out at the sea-line; a reason that many young women have had in the history of the world. But there was no sail in sight.
On the beach below the parade were a succession of small crowds, surrounding the usual orators of the seaside; whether niggers or socialists, whether clowns or clergymen. Here would stand a man doing something or other with paper boxes; and the holiday makers would watch him for hours in the hope of some time knowing what it was that he was doing with them. Next to him would be a man in a top hat with a very big Bible and a very small wife, who stood silently beside him, while he fought with his clenched fist against the heresy of Milnian Sublapsarianism so wide-spread in fashionable watering-places. It was not easy to follow him, he was so very much excited; but every now and then the words “our Sublapsarian friends” would recur with a kind of wailing sneer. Next was a young man talking of nobody knew what (least of all himself), but apparently relying for public favour mainly on having a ring of carrots round his hat. He had more money lying in front of him than the others. Next were niggers. Next was a children’s service conducted by a man with a long neck who beat time with a little wooden spade. Farther along there was an atheist, in a towering rage, who pointed every now and then at the children’s service and spoke of Nature’s fairest things being corrupted with the secrets of the Spanish Inquisition—by the man with the little spade, of course. The atheist (who wore a red rosette) was very withering to his own audience as well. “Hypocrites!” he would say; and then they would throw him money. “Dupes and dastards!” and then they would throw him more money. But between the atheist and the children’s service was a little owlish man in a red fez, weakly waving a green gamp umbrella. His face was brown and wrinkled like a walnut, his nose was of the sort we associate with Judæa, his beard was the sort of black wedge we associate rather with Persia. The young woman had never seen him before; he was a new exhibit in the now familiar museum of cranks and quacks. The young woman was one of those people in whom a real sense of humour is always at issue with a certain temperamental tendency to boredom or melancholia; and she lingered a moment, and leaned on the rail to listen.
It was fully four minutes before she could understand a word the man was saying; he spoke English with so extraordinary an accent that she supposed at first that he was talking in his own oriental tongue. All the noises of that articulation were odd; the most marked was an extreme prolongation of the short “u” into “oo”; as in “poo-oot” for “put.” Gradually the girl got used to the dialect, and began to understand the words; though some time elapsed even then before she could form any conjecture of their subject matter. Eventually it appeared to her that he had some fad about English civilisation having been founded by the Turks; or, perhaps by the Saracens after their victory in the Crusades. He also seemed to think that Englishmen would soon return to this way of thinking; and seemed to be urging the spread of teetotalism as an evidence of it. The girl was the only person listening to him.
“Loo-ook,” he said, wagging a curled brown finger, “loo-ook at your own inns” (which he pronounced as “ince”). “Your inns of which you write in your boo-ooks! Those inns were not poo-oot up in the beginning to sell ze alcoholic Christian drink. They were put up to sell ze non-alcoholic Islamic drinks. You can see this in the names of your inns. They are eastern names, Asiatic names. You have a famous public house to which your omnibuses go on the pilgrimage. It is called the Elephant and Castle. That is not an English name. It is an Asiatic name. You will say there are castles in England, and I will agree with you. There is the Windsor Castle. But where,” he cried sternly, shaking his green umbrella at the girl in an angry oratorical triumph, “where is the Windsor Elephant? I have searched all Windsor Park. No elephants.”
The girl with the dark hair smiled, and began to think that this man was better than any of the others. In accordance with the strange system of concurrent religious endowment which prevails at watering-places, she dropped a two shilling piece into the round copper tray beside him. With honourable and disinterested eagerness, the old gentleman in the red fez took no notice of this, but went on warmly, if obscurely, with his argument.
“Then you have a place of drink in this town which you call The Bool!'''
sentence = text.replace('\n', '')
sentence = re.sub('[\u4e00-\u9fa5]', '', sentence)
sentence = re.sub('。', '', sentence)
sentence = re.sub(',', '', sentence)
sentence = re.sub('《', '', sentence)
sentence = re.sub('》', '', sentence)
sentence = re.sub('“', '', sentence)
sentence = re.sub('”', '', sentence)
sentence = re.sub('、', '', sentence)
sentence = re.sub('(', '', sentence)
sentence = re.sub(')', '', sentence)
sentence = re.sub('\(', '', sentence)
sentence = re.sub('\)', '', sentence)
sentence = re.sub('’', "'", sentence)
for i in sentence:
    if i == ' ' or i == ',' or i == '.' or i == '、' or i == '。' or i == ',' or i == '《' or i == '》' or i == '\(' or i == '\)' or i == '”' or i == '“' or i == '?' or i == '!' or i == '?' or i == '!':
        if x != 0:  #第二格单词前面存在空格需往后移一格,剔除空格
            q = sentence[x + 1:y:1]
            x = y  #记录位置,从x处继续提取
            print(q)
            result.append(q)
        elif x == 0:  #第一个单词前没有空格
            q = sentence[x:y:1]
            x = y
            print(q)
            result.append(q)
    if y == len(sentence) - 1:  #用于提取最后一个单词
        q = sentence[x + 1:y + 1:1]
        print(q)
        result.append(q)
    y = y + 1

for i in result:
    if i == '':
        result.remove('')

print(set(result))

方法二:

split()方法

import re

text = '''CHAPTER I   A SERMON ON INNS
The sea was a pale elfin green and the afternoon had already felt the fairy touch of evening as a young woman with dark hair, dressed in a crinkly copper-coloured sort of dress of the artistic order, was walking rather listlessly along the parade of Pebblewick-on-Sea, trailing a parasol and looking out upon the sea’s horizon. She had a reason for looking instinctively out at the sea-line; a reason that many young women have had in the history of the world. But there was no sail in sight.
On the beach below the parade were a succession of small crowds, surrounding the usual orators of the seaside; whether niggers or socialists, whether clowns or clergymen. Here would stand a man doing something or other with paper boxes; and the holiday makers would watch him for hours in the hope of some time knowing what it was that he was doing with them. Next to him would be a man in a top hat with a very big Bible and a very small wife, who stood silently beside him, while he fought with his clenched fist against the heresy of Milnian Sublapsarianism so wide-spread in fashionable watering-places. It was not easy to follow him, he was so very much excited; but every now and then the words “our Sublapsarian friends” would recur with a kind of wailing sneer. Next was a young man talking of nobody knew what (least of all himself), but apparently relying for public favour mainly on having a ring of carrots round his hat. He had more money lying in front of him than the others. Next were niggers. Next was a children’s service conducted by a man with a long neck who beat time with a little wooden spade. Farther along there was an atheist, in a towering rage, who pointed every now and then at the children’s service and spoke of Nature’s fairest things being corrupted with the secrets of the Spanish Inquisition—by the man with the little spade, of course. The atheist (who wore a red rosette) was very withering to his own audience as well. “Hypocrites!” he would say; and then they would throw him money. “Dupes and dastards!” and then they would throw him more money. But between the atheist and the children’s service was a little owlish man in a red fez, weakly waving a green gamp umbrella. His face was brown and wrinkled like a walnut, his nose was of the sort we associate with Judæa, his beard was the sort of black wedge we associate rather with Persia. The young woman had never seen him before; he was a new exhibit in the now familiar museum of cranks and quacks. The young woman was one of those people in whom a real sense of humour is always at issue with a certain temperamental tendency to boredom or melancholia; and she lingered a moment, and leaned on the rail to listen.
It was fully four minutes before she could understand a word the man was saying; he spoke English with so extraordinary an accent that she supposed at first that he was talking in his own oriental tongue. All the noises of that articulation were odd; the most marked was an extreme prolongation of the short “u” into “oo”; as in “poo-oot” for “put.” Gradually the girl got used to the dialect, and began to understand the words; though some time elapsed even then before she could form any conjecture of their subject matter. Eventually it appeared to her that he had some fad about English civilisation having been founded by the Turks; or, perhaps by the Saracens after their victory in the Crusades. He also seemed to think that Englishmen would soon return to this way of thinking; and seemed to be urging the spread of teetotalism as an evidence of it. The girl was the only person listening to him.
“Loo-ook,” he said, wagging a curled brown finger, “loo-ook at your own inns” (which he pronounced as “ince”). “Your inns of which you write in your boo-ooks! Those inns were not poo-oot up in the beginning to sell ze alcoholic Christian drink. They were put up to sell ze non-alcoholic Islamic drinks. You can see this in the names of your inns. They are eastern names, Asiatic names. You have a famous public house to which your omnibuses go on the pilgrimage. It is called the Elephant and Castle. That is not an English name. It is an Asiatic name. You will say there are castles in England, and I will agree with you. There is the Windsor Castle. But where,” he cried sternly, shaking his green umbrella at the girl in an angry oratorical triumph, “where is the Windsor Elephant? I have searched all Windsor Park. No elephants.”
The girl with the dark hair smiled, and began to think that this man was better than any of the others. In accordance with the strange system of concurrent religious endowment which prevails at watering-places, she dropped a two shilling piece into the round copper tray beside him. With honourable and disinterested eagerness, the old gentleman in the red fez took no notice of this, but went on warmly, if obscurely, with his argument.
“Then you have a place of drink in this town which you call The Bool!'''
print(text.split())
word_list = text.split()
word_list = list(set(word_list))
word_list2 = []
for item in word_list:
    item = re.sub(',|!|\)|\(', '', item)
    word_list2.append(item)
print(word_list2)

大致方法如此,细节之处大家可以根据需要自行进行修改

以上就是soogor软件总结的“python快速提取英文单词的方法总结”的具体内容

本文编辑:soogor
暂无评论,期待你的首评
python实现socket编程(如何用python开发移动App(android、iOS)后台需要掌握哪些技术)

[编程]python实现socket编程(如何用python开发移动App(android、iOS)后台需要掌握哪些技术)

这样一来我们就能使用 async/await 来运行它了怎么学习python自动化测试才好python自动化测试学习路线一、Python的应
2022年9月29日 17:15
lambda函数python(能不能自学python,会不会太难)

[IT百科]lambda函数python(能不能自学python,会不会太难)

你不能只是将数据放入编写Python for循环语句中,这些技巧和窍门将使你的Pandas代码比那些可怕的Python for循环更快地运行
2022年9月28日 06:30
python教程电子书(想自学python,有什么好的建议)

[IT百科]python教程电子书(想自学python,有什么好的建议)

首先要学习Python的基础知识,本文目录想自学python,有什么好的建议有没有python编程偏运维的书python入门书籍有何推荐零基
2022年9月23日 17:45
什么是python(Python是学什么的能做什么)

[IT百科]什么是python(Python是学什么的能做什么)

Python在这方面关于数据分析的库也是非常的丰富的,你认为Python可以干什么Python是学什么的能做什么Python是一个万能工具,
2022年9月15日 15:45
python就业方向是什么?python是干嘛用得语言

[IT百科]python就业方向是什么?python是干嘛用得语言

使用Python从事大数据开发需要学习更多的内容,Python既可以做大数据平台开发,也可以做大数据分析和大数据运维,第六:大数据开发Python真正开始受到广泛关注的一个重要原...
2022年9月8日 01:00
Python try...except...用法

[Python]Python try...except...用法

我们把可能发生错误的语句放在try模块里,用except来处理异常。except可以处理一个专门的异常,也可以处理一组圆括号中的异常,如果except后没有指定异常,则默认处理所...
2022年9月4日 21:14
python批量处理.csv文件代码,python合并.csv文件转txt文件

[Python]python批量处理.csv文件代码,python合并.csv文件转txt文件

我们在使用soogorfactory采集内容时,经常处理好几百个.csv文件。我们整理了一个我们经常使用的代码分享大家我们在爱站或5118下载的关键词文件为.csv的,如果我们手动去...
2022年8月13日 21:40
python: numpy的ndarray和array有什么区别为什么不能plt.imshow()一个ndarray矩阵?请写出下面这个函数的递归过程(javascript),并讲解一下注释部份

[IT百科]python: numpy的ndarray和array有什么区别为什么不能plt.imshow()一个ndarray矩阵?请写出下面这个函数的递归过程(javascript),并讲解一下注释部份

python: numpy的ndarray和array有什么区别为什么不能plt.imshow()一个ndarray矩阵问:What is the difference between ndarray and array in Numpy?
2022年8月11日 05:30
python爬虫什么教程最好?零基础如何入门学习Python

[IT百科]python爬虫什么教程最好?零基础如何入门学习Python

python爬虫什么教程最好链接:课程目录开始之前,魔力手册 for 实战学员预习第一周:学会爬取网页信息第二周:学会爬取大规模数据第三周:数据统计与分析第四周:搭建 D...
2022年7月27日 00:30
python如何自学?要学plc编程,有哪位高人指点一下,也可推荐一下比较好的电子教程(非视频版)

[IT百科]python如何自学?要学plc编程,有哪位高人指点一下,也可推荐一下比较好的电子教程(非视频版)

python如何自学学习python主要有自学和报班学习两种方式。具体学的顺序如下:①Python软件开发基础掌握计算机的构成和工作原理会使用Linux常用工具熟练使用Docker的基本...
2022年7月25日 09:30
Copyright © 2022 All Rights Reserved 山东上格信息科技有限公司 版权所有

鲁ICP备20007704号

Thanks for visiting my site.