The crawler just written in Java, now only has the ability to download images (depth 1), and will continue...

Delver_Si · Posted on 6/3/2015 2:38:12 AM

As the title suggests

crawler.rar (62.53 KB, Number of downloads: 5, Selling price: 2 Grain MB)

microxdd · Posted on 6/3/2015 9:05:36 PM

Simple implementation that doesn't depend on other packages

package test;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.imageio.ImageIO;
public class Test {
public static void main(String[] args) {
String web="http://www.itsvse.com/";
try {
URL url=new URL(web);
InputStreamReader reader=new InputStreamReader(url.openStream());
StringBuilder builder=new StringBuilder();
char[] buff=new char[1024];
int n;
while((n=reader.read(buff))!=-1){
builder.append(buff,0,n);
}
Pattern pattern=Pattern.compile("<img.*?src="(.*?)(gif|png|jpg)"");
Matcher m=pattern.matcher(builder);
while (m.find()) {
String u=m.group(1)+m.group(2);
System.out.println("dowing.."+u);
URL img=null;
if(u.startsWith("http")){
img=new URL(u);
}else{
img=new URL(url,u);
}
ImageIO.write(ImageIO.read(img), m.group(2), new File("D:/img/"+System.currentTimeMillis()+"."+m.group(2)));
}
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

Copy code

microxdd · Posted on 6/4/2015 7:19:48 PM

Delver_Si Posted on 2015-6-3 23:57
Your original code development efficiency is too low, bad review

I didn't want to say anything, but you said that development is inefficient。。。。。

The program requires the quality and performance of the code, and in the end, it has few functions, poor expansion ability, and poor performance

Run 10 times in a row, ignore network latency each time, ignore local saves, and only calculate the time to parse html documents, your program is far from it.
Also, there are errors in your code, so I won't say anything

Little scum · Posted on 6/3/2015 1:00:52 PM

Delver_Si Posted on 2015-6-3 12:57
How is the study?

I didn't install eclipse and looked at it in a notepad, first grabbed the HTML source code of the web page, then got the value after src, and then saved it

I don't know if it's right

The knife is flying · Posted on 6/3/2015 7:49:23 AM

Can images in PNG format be grabbed?

Delver_Si · Posted on 6/3/2015 10:17:34 AM

Small knife flying knife flying flying published on 2015-6-3 07:49
Can images in PNG format be grabbed?

Yes, I haven't judged the suffix now, all of them are saved as jpg, in fact, the png image can be opened with a jpg suffix, and I will improve the suffix

Little scum · Posted on 6/3/2015 12:52:13 PM

Let me study the research

Delver_Si · Posted on 6/3/2015 12:57:13 PM

Small slag Posted on 2015-6-3 12:52
Let me study the research

How is the study?

Delver_Si · Posted on 6/3/2015 1:05:27 PM

Xiao Zhazha Posted on 2015-6-3 13:00
I didn't install eclipse and looked at it in a notepad, first grab the html source code of the web page, then get the value after src, and then save the rough ...

That is true

Little scum · Posted on 6/3/2015 9:12:09 PM

microxdd posted on 2015-6-3 21:05
Simple implementation that doesn't depend on other packages

This is the rhythm that forces me to install myeclipse!

Delver_Si · Posted on 6/3/2015 11:57:27 PM

microxdd posted on 2015-6-3 21:05
Simple implementation that doesn't depend on other packages

Your original code development efficiency is too low, bad review

[JavaSE] The crawler just written in Java, now only has the ability to download images (depth 1), and will continue...

Related Posts

Sections viewed