为了账号安全,请及时绑定邮箱和手机立即绑定

我如何从python中的字符串中提取月份和年份?

我如何从python中的字符串中提取月份和年份?

郎朗坤 2022-12-14 21:05:49
输入文字:text = "Wipro Limited | Hyderabad, IN                Dec 2017 – PresentProject Analyst Infosys | Delhi, IN                Apr 2017 – Nov 2017 Software Developer HCL Technologies | Hyderabad, IN                Jun 2016 – Mar 2017 Software Engineer  "我已经为此编写了一个代码,但它显示在每个提取的单词的列表中并且无法执行任何操作。regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s+\–\s+(?P<month1>[a-zA-Z]+)\s+(?P<year1>\d{4})')mat = re.findall(regex, text)mat查看代码:https ://regex101.com/r/mMlgYp/1 。我希望像下面这样的输出可以预览日期并对其进行区分,然后计算总经验:此处 Present 或 Till date 应该考虑当前的月份和年份。import timePresent = time.strftime("%m-%Y")Present # output: '05-2020'#Desired outputExtracted dates: [('Dec 2017 - Present'), ('Apr 2017 - Nov 2017'), ('Jun 2016 - Mar 2017')]# and so on ...should display all the search results First experience: 1.9 years second experience: 8 monthsthird experience: 7 months# and so on ...should display all the search results Total experience: 3.4 years请帮助我,我是编程语言和 NLP、正则表达式方面的新手。
查看完整描述

2 回答

?
ABOUTYOU

TA贡献1812条经验 获得超5个赞

您可能最终希望在数据框中使用它,因为您将其标记为 pandas(请参阅Andrej 的回答),但无论哪种方式,您都可以使用内插法从字符串中解析日期:


fr"(?i)((?:{months}) *\d{{4}}) *(?:-|–) *(present|(?:{months}) *\d{{4}})"

{months}所有可能的月份名称和缩写的交替组在哪里。


import calendar

import re

from datetime import datetime

from dateutil.relativedelta import relativedelta


text = """Wipro Limited | Hyderabad, IN                Dec 2017 – Present

Project Analyst 


Infosys | Delhi, IN                Apr 2017 – Nov 2017 

Software Developer 


HCL Technologies | Hyderabad, IN                Jun 2016 – Mar 2017 

Software Engineer  

"""


def parse_date(x, fmts=("%b %Y", "%B %Y")):

    for fmt in fmts:

        try:

            return datetime.strptime(x, fmt)

        except ValueError:

            pass


months = "|".join(calendar.month_abbr[1:] + calendar.month_name[1:])

pattern = fr"(?i)((?:{months}) *\d{{4}}) *(?:-|–) *(present|(?:{months}) *\d{{4}})"

total_experience = None


for start, end in re.findall(pattern, text):

    if end.lower() == "present":

        today = datetime.today()

        end = f"{calendar.month_abbr[today.month]} {today.year}"


    duration = relativedelta(parse_date(end), parse_date(start))


    if total_experience:

        total_experience += duration

    else: 

        total_experience = duration


    print(f"{start}-{end} ({duration.years} years, {duration.months} months)")


if total_experience:

    print(f"total experience:  {total_experience.years} years, {total_experience.months} months")

else:

    print("couldn't parse text")

输出:


Dec 2017-May 2020 (2 years, 5 months)

Apr 2017-Nov 2017 (0 years, 7 months)

Jun 2016-Mar 2017 (0 years, 9 months)

total experience:  3 years, 9 months


查看完整回答
反对 回复 2022-12-14
?
回首忆惘然

TA贡献1847条经验 获得超11个赞

import re

import numpy as np

import pandas as pd


text = '''Wipro Limited | Hyderabad, IN                Dec 2017 – Present

Project Analyst


Infosys | Delhi, IN                Apr 2017 – Nov 2017

Software Developer


HCL Technologies | Hyderabad, IN                Jun 2016 – Mar 2017

Software Engineer

'''


def pretty_format(monthts):

    return f'{monthts/12:.1f} years' if monthts > 11 else f'{monthts:.1f} months'


data = []

for employer, d1, d2 in re.findall(r'(.*?)\s*\|.*([A-Z][a-z]{2} [12]\d{3}) – (?:([A-Z][a-z]{2} [12]\d{3})|Present)', text):

    data.append({'Employer': employer, 'Begin': d1, 'End': d2 or np.nan})


df = pd.DataFrame(data)

df['Begin'] = pd.to_datetime(df['Begin'])

df['End'] = pd.to_datetime(df['End'])


df['Experience'] = ((df['End'].fillna(pd.to_datetime('now')) - df['Begin']) / np.timedelta64(1, 'M')).apply(pretty_format)

print(df)


total = np.sum(df['End'].fillna(pd.to_datetime('now')) - df['Begin']) / np.timedelta64(1, 'M')

print()

print(f'Total experience = {pretty_format(total)}')

印刷:


           Employer      Begin        End  Experience

0     Wipro Limited 2017-12-01        NaT   2.5 years

1           Infosys 2017-04-01 2017-11-01  7.0 months

2  HCL Technologies 2016-06-01 2017-03-01  9.0 months


Total experience = 3.8 years


查看完整回答
反对 回复 2022-12-14
  • 2 回答
  • 0 关注
  • 281 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
微信客服

购课补贴
联系客服咨询优惠详情

帮助反馈 APP下载

慕课网APP
您的移动学习伙伴

公众号

扫描二维码
关注慕课网微信公众号