I'm a newbie in Python I am learning regexes, but I need help here.
Here comes the HTML source:
& lt; A href = "http://www.ptop.se" Target = "_blank" & gt; Http: //www.ptop.se< / A & gt;
I'm trying to code a device that only prints http://ptop.se
. Can you help me
If you are looking for only one:
import Again match = re-search (r'href = [\ '"]? ([^ \'" & Gt;] +, ''), if match: print match.group (0)
If you have a long string, and want every instance of the pattern in it:
import re urls = re.findall (r'href = [\ '"]? ( [^ \ '"Gt;] +,' s) print ',' .join (urls)
where s
is the string that matches you are seeing.
Quick interpretation of RexPP bits:
r '...'
is a "raw" string. This prevents you from being concerned about avoiding the characters as you normally would. (\
especially - in a raw string,\
is just a\
. You must do this in a regular string\\
Every time, and it becomes old in regexps.)"
href = [\" "]?
"says the match" href = ", possibly followed by a'
or"
. "Probably" because it's hard to say that the HTML you are seeing, how terrible it is, and the quotes are not strictly necessary."
()
" This is called to create a "group", which means that to separate it and we have to return it separately, this is just to say one The method is that "This is part of that method in which I am interested.""
[^ \ '" & gt;] +
"Any characters that are not'
,"
,& gt;
, or a location is basically a list of those characters, which are the end of the URL. This allows us to avoid trying to write a regexp that matches a full URL, which can be a bit complicated.
The suggestion in the second answer is not bad, but it starts with a high level of external requirements, except that it does not help you in your stated goal of learning reggaes. , Which I believe is just part of this specific HTML parsing project.
It is very easy to do:
Once you have beautifully installed, anyway.
Comments
Post a Comment