抽取OPML文件标题和网址的python脚本

作者: 麻辣阁 分类: python 发布时间: 2018-11-27 14:59
s = """<outline text="English" title="English" type="rss" version="RSS" htmlUrl="http://english.people.com.cn" xmlUrl="http://english.people.com.cn/rss/90000.xml"/>
<outline text="Leaders" title="Leaders" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/102839" xmlUrl="http://english.people.com.cn/rss/102839.xml"/>
<outline text="Opinions" title="Opinions" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/90780" xmlUrl="http://english.people.com.cn/rss/90780.xml"/>
<outline text="China_Politics" title="China_Politics" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/90785" xmlUrl="http://english.people.com.cn/rss/90785.xml"/>
<outline text="Foreign_Affairs" title="Foreign_Affairs" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/90883" xmlUrl="http://english.people.com.cn/rss/90883.xml"/>
<outline text="China_Military" title="China_Military" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/90786" xmlUrl="http://english.people.com.cn/rss/90786.xml"/>
<outline text="China_Business" title="China_Business" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/90778" xmlUrl="http://english.people.com.cn/rss/90778.xml"/>
<outline text="China_Society" title="China_Society" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/90882" xmlUrl="http://english.people.com.cn/rss/90882.xml"/>
<outline text="China_Features" title="China_Features" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/102780" xmlUrl="http://english.people.com.cn/rss/102780.xml"/>
<outline text="World" title="World" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/90777" xmlUrl="http://english.people.com.cn/rss/90777.xml"/>
<outline text="Life_Culture" title="Life_Culture" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/90782" xmlUrl="http://english.people.com.cn/rss/90782.xml"/>
<outline text="Science_Education" title="Science_Education" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/202963" xmlUrl="http://english.people.com.cn/rss/202963.xml"/>
<outline text="Photo" title="Photo" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/90783" xmlUrl="http://english.people.com.cn/rss/90783.xml"/>
<outline text="Video" title="Video" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/98389" xmlUrl="http://english.people.com.cn/rss/98389.xml"/>
<outline text="Sports" title="Sports" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/90779" xmlUrl="http://english.people.com.cn/rss/90779.xml"/>
<outline text="PD_Online_Database" title="PD_Online_Database" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/102759" xmlUrl="http://english.people.com.cn/rss/102759.xml"/>
<outline text="Special_Coverage" title="Special_Coverage" type="rss" version="RSS" htmlUrl="http://english.people.com.cn/102775" xmlUrl="http://english.people.com.cn/rss/102775.xml"/>
"""

slist = s.split('\n')
#print(slist)

for channel in slist:
    if len(channel)>10:
        titlestart = channel.find('title=')
        titleend = channel.find('type=',titlestart)
        urlstart = channel.find('xmlUrl=\"')
        urlend = channel.find('\"/>', urlstart)
        title = channel[titlestart+7:titleend-2]
        url = channel[urlstart+8:urlend]
        print("People's Daily Online - " + title + ': [' +url +'](' +url +')' )

输出:

People's Daily Online - English: [http://english.people.com.cn/rss/90000.xml](http://english.people.com.cn/rss/90000.xml)
People's Daily Online - Leaders: [http://english.people.com.cn/rss/102839.xml](http://english.people.com.cn/rss/102839.xml)
People's Daily Online - Opinions: [http://english.people.com.cn/rss/90780.xml](http://english.people.com.cn/rss/90780.xml)
People's Daily Online - China_Politics: [http://english.people.com.cn/rss/90785.xml](http://english.people.com.cn/rss/90785.xml)
People's Daily Online - Foreign_Affairs: [http://english.people.com.cn/rss/90883.xml](http://english.people.com.cn/rss/90883.xml)
People's Daily Online - China_Military: [http://english.people.com.cn/rss/90786.xml](http://english.people.com.cn/rss/90786.xml)
People's Daily Online - China_Business: [http://english.people.com.cn/rss/90778.xml](http://english.people.com.cn/rss/90778.xml)
People's Daily Online - China_Society: [http://english.people.com.cn/rss/90882.xml](http://english.people.com.cn/rss/90882.xml)
People's Daily Online - China_Features: [http://english.people.com.cn/rss/102780.xml](http://english.people.com.cn/rss/102780.xml)
People's Daily Online - World: [http://english.people.com.cn/rss/90777.xml](http://english.people.com.cn/rss/90777.xml)
People's Daily Online - Life_Culture: [http://english.people.com.cn/rss/90782.xml](http://english.people.com.cn/rss/90782.xml)
People's Daily Online - Science_Education: [http://english.people.com.cn/rss/202963.xml](http://english.people.com.cn/rss/202963.xml)
People's Daily Online - Photo: [http://english.people.com.cn/rss/90783.xml](http://english.people.com.cn/rss/90783.xml)
People's Daily Online - Video: [http://english.people.com.cn/rss/98389.xml](http://english.people.com.cn/rss/98389.xml)
People's Daily Online - Sports: [http://english.people.com.cn/rss/90779.xml](http://english.people.com.cn/rss/90779.xml)
People's Daily Online - PD_Online_Database: [http://english.people.com.cn/rss/102759.xml](http://english.people.com.cn/rss/102759.xml)
People's Daily Online - Special_Coverage: [http://english.people.com.cn/rss/102775.xml](http://english.people.com.cn/rss/102775.xml)

发表评论

电子邮件地址不会被公开。 必填项已用*标注