达芬奇幸运之门彩票网:关闭线程池,记录正在进行却未完成的任务,fasle positive and idempotent

来源:百度文库 编辑:九乡新闻网 时间:2024/04/25 16:43:14
使用TrackingExecutor 代理 ExecutorService,当取消任务时,未完成的task会被记录下来,以备下次启动继续执行。

TrackingExecutor has an unavoidable race conditionthat could make it yield false positives: tasks that are identified as cancelledbut actually completed. This arises because the thread pool could beshut down between when the last instruction of the task executes and when thepool records the task as complete. This is not a problem if tasks are idempotent (if performing them twice has the sameeffect as performing them once), as they typically are in a web crawler.Otherwise, the application retrieving the cancelled tasks must be aware of thisrisk and be prepared to deal with false positives.

public class TrackingExecutor extends AbstractExecutorService {
    private final ExecutorService exec;
    private final Set tasksCancelledAtShutdown =
            Collections.synchronizedSet(new HashSet());

    public TrackingExecutor(ExecutorService exec) {
        this.exec = exec;

    public void shutdown() {

    public List shutdownNow() {
        return exec.shutdownNow();

    public boolean isShutdown() {
        return exec.isShutdown();

    public boolean isTerminated() {
        return exec.isTerminated();

    public boolean awaitTermination(long timeout, TimeUnit unit)
            throws InterruptedException {
        return exec.awaitTermination(timeout, unit);

    public List getCancelledTasks() {
        if (!exec.isTerminated())
            throw new IllegalStateException(/*...*/);
        return new ArrayList(tasksCancelledAtShutdown);

    public void execute(final Runnable runnable) {
        exec.execute(new Runnable() {
            public void run() {
                try {
                } finally {
                    if (isShutdown()
                            && Thread.currentThread().isInterrupted())
public abstract class WebCrawler {
    private volatile TrackingExecutor exec;
    @GuardedBy("this") private final Set urlsToCrawl = new HashSet();

    private final ConcurrentMap seen = new ConcurrentHashMap();
    private static final long TIMEOUT = 500;
    private static final TimeUnit UNIT = MILLISECONDS;

    public WebCrawler(URL startUrl) {

    public synchronized void start() {
        exec = new TrackingExecutor(Executors.newCachedThreadPool());
        for (URL url : urlsToCrawl) submitCrawlTask(url);

    public synchronized void stop() throws InterruptedException {
        try {
            if (exec.awaitTermination(TIMEOUT, UNIT))
        } finally {
            exec = null;

    protected abstract List processPage(URL url);

    private void saveUncrawled(List uncrawled) {
        for (Runnable task : uncrawled)
            urlsToCrawl.add(((CrawlTask) task).getPage());

    private void submitCrawlTask(URL u) {
        exec.execute(new CrawlTask(u));

    private class CrawlTask implements Runnable {
        private final URL url;

        CrawlTask(URL url) {
            this.url = url;

        private int count = 1;

        boolean alreadyCrawled() {
            return seen.putIfAbsent(url, true) != null;

        void markUncrawled() {
            System.out.printf("marking %s uncrawled%n", url);

        public void run() {
            for (URL link : processPage(url)) {
                if (Thread.currentThread().isInterrupted())

        public URL getPage() {
            return url;