« Проверка орфографии в Open Office 2.3 Message Box - как получить текст »

How to extract url’s from html

December
26th
member
Dmytro Gorbunov

Sometimes I need for download many(but not all) URL’s from the web page. I will describe in this post how it’s possible to do this.

For me it’s handy to use wget console utility for download everything. This download manager has option -i for download all urls from file.

So, for download many links we have to do following:

  1. In Firefox browser select interesting urls and perform “View Selection Sources”
  2. Copy html code that contain interesting links for download
  3. Create file, for example “links.txt” with that html.
  4. Launch
    perl extractUrls.pl links.txt > li.txt

    for extract links.

  5. Launch
    wget -i li.txt

    for download extracted links

extractUrls.pl:

#!perl -w
use strict;
while(<>){
    print "$_\n" foreach (/"(http[^"]+)/g)
}

I may remove some action for you by using clipboard or another techniques if you really interesting with that. Video tutorial will be made by your request also.

Regards,

Dmytro


date Posted on: Wednesday, December 26, 2007 at 7:21 pm
Category web surfing.
You can follow any responses to this entry through the RSS 2.0 feed.

You can leave a response, or trackback from your own site.



One Response to “How to extract url’s from html”

Leave a Reply

You must be logged in to post a comment.


Smart Mobile Lab is powered by WordPress
Theme is Coded&Designed by Wordpress Themes at ricdes