This article is a mirror article of machine translation, please click here to jump to the original article.

View: 32261|Reply: 11

[JavaSE] The crawler just written in Java, now only has the ability to download images (depth 1), and will continue...

[Copy link]
Posted on 6/3/2015 2:38:12 AM | | | |
As the title suggests

crawler.rar (62.53 KB, Number of downloads: 5, Selling price: 2 Grain MB)




Previous:Do you know what web containers asp.net have? Except IIS
Next:deduplicate data and return the collection
Posted on 6/3/2015 9:05:36 PM |
Simple implementation that doesn't depend on other packages

  1. package test;

  2. import java.io.File;
  3. import java.io.IOException;
  4. import java.io.InputStreamReader;
  5. import java.net.MalformedURLException;
  6. import java.net.URL;
  7. import java.util.regex.Matcher;
  8. import java.util.regex.Pattern;

  9. import javax.imageio.ImageIO;

  10. public class Test {
  11.         public static void main(String[] args) {
  12.                 String web="http://www.itsvse.com/";
  13.                 try {
  14.                         URL url=new URL(web);
  15.                         InputStreamReader reader=new InputStreamReader(url.openStream());
  16.                        
  17.                         StringBuilder builder=new StringBuilder();
  18.                         char[] buff=new char[1024];
  19.                         int n;
  20.                         while((n=reader.read(buff))!=-1){
  21.                                 builder.append(buff,0,n);
  22.                         }
  23.                        
  24.                         Pattern pattern=Pattern.compile("<img.*?src="(.*?)(gif|png|jpg)"");
  25.                        
  26.                         Matcher m=pattern.matcher(builder);
  27.                         while (m.find()) {
  28.                                 String u=m.group(1)+m.group(2);
  29.                                 System.out.println("dowing.."+u);
  30.                                 URL img=null;
  31.                                 if(u.startsWith("http")){
  32.                                         img=new URL(u);
  33.                                 }else{
  34.                                         img=new URL(url,u);
  35.                                 }
  36.                                 ImageIO.write(ImageIO.read(img), m.group(2), new File("D:/img/"+System.currentTimeMillis()+"."+m.group(2)));
  37.                                
  38.                         }
  39.                        
  40.                 } catch (MalformedURLException e) {
  41.                         // TODO Auto-generated catch block
  42.                         e.printStackTrace();
  43.                 } catch (IOException e) {
  44.                         // TODO Auto-generated catch block
  45.                         e.printStackTrace();
  46.                 }
  47.         }
  48. }
Copy code
Posted on 6/4/2015 7:19:48 PM |
Delver_Si Posted on 2015-6-3 23:57
Your original code development efficiency is too low, bad review

I didn't want to say anything, but you said that development is inefficient。。。。。

The program requires the quality and performance of the code, and in the end, it has few functions, poor expansion ability, and poor performance


Run 10 times in a row, ignore network latency each time, ignore local saves, and only calculate the time to parse html documents, your program is far from it.
Also, there are errors in your code, so I won't say anything

Posted on 6/3/2015 1:00:52 PM |

I didn't install eclipse and looked at it in a notepad, first grabbed the HTML source code of the web page, then got the value after src, and then saved it   

I don't know if it's right  
Posted on 6/3/2015 7:49:23 AM |
Can images in PNG format be grabbed?
 Landlord| Posted on 6/3/2015 10:17:34 AM |

Yes, I haven't judged the suffix now, all of them are saved as jpg, in fact, the png image can be opened with a jpg suffix, and I will improve the suffix
Posted on 6/3/2015 12:52:13 PM |
Let me study the research
 Landlord| Posted on 6/3/2015 12:57:13 PM |
Small slag Posted on 2015-6-3 12:52
Let me study the research

How is the study?
 Landlord| Posted on 6/3/2015 1:05:27 PM |
Xiao Zhazha Posted on 2015-6-3 13:00
I didn't install eclipse and looked at it in a notepad, first grab the html source code of the web page, then get the value after src, and then save the rough ...

That is true
Posted on 6/3/2015 9:12:09 PM |
microxdd posted on 2015-6-3 21:05
Simple implementation that doesn't depend on other packages

This is the rhythm that forces me to install myeclipse!
 Landlord| Posted on 6/3/2015 11:57:27 PM |
microxdd posted on 2015-6-3 21:05
Simple implementation that doesn't depend on other packages

Your original code development efficiency is too low, bad review
Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com