首页手记 Android P上Java Crash、Native...

Android P上Java Crash、Native Crash的异常处理流程学习

标签：

Android

常见的应用闪退有Java Crash和Native Crash引起，基于最新的Android P源码，以下是其2者的异常处理流程学习：

一. Java Crash

Java代码中未被try catch捕获的异常发生时，虚拟机会调用Thread#dispatchUncaughtException方法来处理异常：

// libcore/ojluni/src/main/java/java/lang/Thread.javapublic final void dispatchUncaughtException(Throwable e) {
    Thread.UncaughtExceptionHandler initialUeh =
        Thread.getUncaughtExceptionPreHandler();    if (initialUeh != null) {        try {
            initialUeh.uncaughtException(this, e);
        } catch (RuntimeException | Error ignored) {            // Throwables thrown by the initial handler are ignored
        }
    }
    getUncaughtExceptionHandler().uncaughtException(this, e);
}

以上流程中，共有2个UncaughtExceptionHandler会参与处理，分别是PreHandler和Handler，核心是执行其各自实现的uncaughtException方法。

Android中提供了此二者的默认实现。Android系统中，应用进程由Zygote进程孵化而来，Zygote进程启动时，zygoteInit方法中会调用RuntimeInit.commonInit，代码如下：

// frameworks/base/core/java/com/android/internal/os/ZygoteInit.java/**
  * The main function called when started through the zygote process...
  */public static final Runnable zygoteInit(int targetSdkVersion, String[] argv, ClassLoader classLoader) {    // ...
    RuntimeInit.commonInit();
    ZygoteInit.nativeZygoteInit();    return RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader);
}

RuntimeInit.commonInit方法中会设置默认的UncaughtExceptionHandler，代码如下：

// frameworks/base/core/java/com/android/internal/os/RuntimeInit.javaprotected static final void commonInit() {    // ...
    /*
     * set handlers; these apply to all threads in the VM. Apps can replace
     * the default handler, but not the pre handler.
     */
    LoggingHandler loggingHandler = new LoggingHandler();
    Thread.setUncaughtExceptionPreHandler(loggingHandler);
    Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler));    // ...}

实例化2个对象，分别是LoggingHandler和KillApplicationHandler，均继承于Thread#UncaughtExceptionHandler，重写unCaughtException方法。其中：

LoggingHandler，打印异常信息，包括进程名，pid，Java栈信息等。

系统进程，日志以"*** FATAL EXCEPTION IN SYSTEM PROCESS: "开头
应用进程，日志以"FATAL EXCEPTION: "开头

KillApplicationHandler，通知AMS，杀死进程。代码如下：

@Overridepublic void uncaughtException(Thread t, Throwable e) {    try {        // 1. 确保LoggingHandler已打印出信息（Android 9.0新增）
        ensureLogging(t, e);        // 2. 通知AMS处理异常，弹出闪退的对话框等
        ActivityManager.getService().handleApplicationCrash(
                   mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));
    } catch (Throwable t2) {        // ...
    } finally {        // 3. 确保杀死进程
        Process.killProcess(Process.myPid()); // 本质上给自己发送Singal 9，杀死进程
        System.exit(10); // Java中关闭进程的方法，调用其结束Java虚拟机
    }
}

注意 1：

Thread#setDefaultUncaughtExceptionHandler是公开API。应用可通过调用，自定义UncaughtExceptionHandler，替换掉KillApplicationHandler，这样能自定义逻辑处理掉异常，避免闪退发生。
Thread#setUncaughtExceptionPreHandler是hidden API。应用无法调用，不能替换LoggingHandler。

/**
 * ......
 * @hide only for use by the Android framework (RuntimeInit) b/29624607
 */public static void setUncaughtExceptionPreHandler(UncaughtExceptionHandler eh) {
    uncaughtExceptionPreHandler = eh;
}
....public static void setDefaultUncaughtExceptionHandler(UncaughtExceptionHandler eh) {
    defaultUncaughtExceptionHandler = eh;
}

因此常出现的情况：
App运行时抛出uncaught exception后，LoggingHandler在日志中打印出了“FATAL EXCEPTION”信息，但应用已替换KillApplicationHandler，应用进程并不会退出，AMS也不会得到通知。应用仍正常运行。

注意 2：
默认情况下，uncaught exception发生后，KillApplicationHandler的方法中会执行System.exit(10)结束进程的Java虚拟机。此时，如果进程中仍有逻辑创建新线程，会抛出错误Error：Thread starting during runtime shutdown。如下：

java.lang.InternalError: Thread starting during runtime shutdown
at java.lang.Thread.nativeCreate(Native Method)
at java.lang.Thread.start(Thread.java:733)

日志中遇见此Error，建议首先查找下引发进程异常退出的真正原因。

二. Native Crash

Native异常发生时，CPU通过异常中断的方式，触发异常处理流程。Linux kernel会将中断处理，统一为信号。应用进程可以注册接收信号。

Android P，默认注册信号处理函数的代码位置是：bionic/linker/linker_main.cpp，其中调用debuggerd_init方法注册。linker_main.cpp代码如下：

// bionic/linker/linker_main.cpp/*
 * This code is called after the linker has linked itself and
 * fixed it's own GOT. It is safe to make references to externs
 * and other non-local data at this point.
 */static ElfW(Addr) __linker_init_post_relocation(KernelArgumentBlock& args) {    // ...
    debuggerd_init(&callbacks);
}

debuggerd_init方法中会执行信号处理函数的注册，代码如下：

// system/core/debuggerd/handler/debuggerd_handler.cppvoid debuggerd_init(debuggerd_callbacks_t* callbacks) {    // ...
    struct sigaction action;
    memset(&action, 0, sizeof(action));
    sigfillset(&action.sa_mask);
    action.sa_sigaction = debuggerd_signal_handler;
    action.sa_flags = SA_RESTART | SA_SIGINFO;    // Use the alternate signal stack if available so we can catch stack overflows.
    action.sa_flags |= SA_ONSTACK;
    debuggerd_register_handlers(&action);
}

由上看出，信号处理的默认函数是debuggerd_signal_handler，那注册接收哪些信号呢？具体看debuggerd_register_handlers方法，如下：

// system/core/debuggerd/include/debuggerd/handler.hstatic void __attribute__((__unused__)) debuggerd_register_handlers(struct sigaction* action) {
    sigaction(SIGABRT, action, nullptr);
    sigaction(SIGBUS, action, nullptr);
    sigaction(SIGFPE, action, nullptr);
    sigaction(SIGILL, action, nullptr);
    sigaction(SIGSEGV, action, nullptr);    #if defined(SIGSTKFLT)
        sigaction(SIGSTKFLT, action, nullptr);    #endif
    sigaction(SIGSYS, action, nullptr);
    sigaction(SIGTRAP, action, nullptr);
    sigaction(DEBUGGER_SIGNAL, action, nullptr);
}

通过sigaction方法，注册接收的信号有：SIGABRT，SIGBUS，SIGFPE，SIGILL，SIGSEGV，SIGSTKFLT，SIGSYS，SIGTRAP，DEBUGGER_SIGNAL，共计9个。

接下来，如果Native异常发生，处理流程如下：

应用的默认信号处理函数debuggerd_signal_handler被调用，其主要作用是针对目标进程，clone出一个子进程，并执行debuggerd_dispatch_pseudothread方法，此方法执行结束后，子进程退出。如下：

// system/core/debuggerd/handler/debuggerd_handler.cpp// Handler that does crash dumping by forking and doing the processing in the child.// Do this by ptracing the relevant thread, and then execing debuggerd to do the actual dump.static void debuggerd_signal_handler(int signal_number, siginfo_t* info, void* context) {    // ...
    // 1. 打印一条Fatal signal日志，包含基本的异常信息
    log_signal_summary(info); 
    
    // 2. clone子进程
    pid_t child_pid = 
        clone(debuggerd_dispatch_pseudothread, pseudothread_stack,
              CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID,
              &thread_info, nullptr, nullptr, &thread_info.pseudothread_tid);    // ...}

log_signal_summary方法会在日志中打印一条“Fatal signal”的异常信息。通过注释大致了解，如果后续过程失败，至少先保留一条基本的Native异常信息。例如：
12-16 14:30:17.067 10177 4780 4780 F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x74 in tid 4780 (com.kevin.test), pid 4780 (com.kevin.test)

子进程clone出后，会执行debuggerd_dispatch_pseudothread方法，其主要作用是通过execle函数，执行/system/bin/crash_dump32或/system/bin/crash_dump64程序，并传入相关参数，包括：

main_tid：发生Native Crash的线程id（目标进程）
pseudothread_tid：初步从代码看，与获取backtrace有关，后续更多调研
debuggerd_dump_type：共有4种dump类型，发生Native Crash时的类型是kDebuggerdTombstone

static int debuggerd_dispatch_pseudothread(void* arg) {    // ...
    execle(CRASH_DUMP_PATH, CRASH_DUMP_NAME, main_tid, pseudothread_tid, debuggerd_dump_type, nullptr, nullptr);    // ...}

注意：此时执行crash_dump32或crash_dump64，并不会新创建一个进程。原因是：Linux中，execle函数将当前进程替换为1个新进程，新启动的程序main方法被执行，新旧进程的pid不变。

crash_dump.cpp的main方法会执行，代码位置：system/core/debuggerd/crash_dump.cpp，这里可以说是Native Crash异常处理的核心代码，其主要作用是：

通过ptrace attach到应用（看源码这里循环ptrace到应用的每条子线程，并针对发生Native Crash的线程会调用ReadCrashInfo方法），读取应用的寄存器等信息，最终汇总所有异常信息，包括机型版本，ABI，信号，寄存器，backtrace等，在日志中输出
通过Socket通知tombstoned进程，将所有异常信息输出到/data/tombstones/tombstone_xx文件中
通过Socket通知System_server进程，（NativeCrashListener线程会监听socket通信），并最终调用到AMS#handleApplicationCrashInner方法（逻辑同Java Crash的处理此时保持一致）

以上逻辑，主要代码如下：

// system/core/debuggerd/crash_dump.cppint main(int argc, char** argv) {    // ...
    // 1. 通过ptrach attach到应用，获取异常信息
    ATRACE_NAME("ptrace");    for (pid_t thread : threads) {        // ...
        ThreadInfo info;
        info.pid = target_process;
        info.tid = thread;
        info.process_name = process_name;
        info.thread_name = get_thread_name(thread);        if (!ptrace_interrupt(thread, &info.signo)) {
            PLOG(WARNING) << "failed to ptrace interrupt thread " << thread;
            ptrace(PTRACE_DETACH, thread, 0, 0);            continue;
        }        if (thread == g_target_thread) {            // Read the thread's registers along with the rest of the crash info out of the pipe.kDebuggerdTombstone,
            ReadCrashInfo(input_pipe, &siginfo, &info.registers, &abort_address);
            info.siginfo = &siginfo;
            info.signo = info.siginfo->si_signo;
        } else {
            info.registers.reset(Regs::RemoteGet(thread));            if (!info.registers) {
                PLOG(WARNING) << "failed to fetch registers for thread " << thread;
                ptrace(PTRACE_DETACH, thread, 0, 0);                continue;
            }
        }        // ...
    }    // ...
    // 2. 与tombstoned进程建立Socket通信，目的由tombstoned进程输出异常信息至/data/tombstones/tombstone_xx文件
    {
        ATRACE_NAME("tombstoned_connect");
        LOG(INFO) << "obtaining output fd from tombstoned, type: " << dump_type;
        g_tombstoned_connected =
            tombstoned_connect(g_target_thread, &g_tombstoned_socket, &g_output_fd, dump_type);
    }    // ...
    // 3. 通过Socket通知System_server进程
    activity_manager_notify(target_process, signo, amfd_data);    // ...}

最后介绍下AMS端的处理。System_server进程中，AMS启动时，会先调用startObservingNativeCrashes方法，启动1个新线程NativeCrashListener，其作用是循环监听Socket端口（Socket Path：/data/system/ndebugsocket），接收来自debuggerd端的Native异常信息（如上面分析，对端是执行crash_dump程序的进程）。主要代码如下：

// frameworks/base/services/core/java/com/android/server/am/NativeCrashListener.javafinal class NativeCrashListener extends Thread {    // ...
    @Override
    public void run() {        // ...
        try {
            FileDescriptor serverFd = Os.socket(AF_UNIX, SOCK_STREAM, 0);            final UnixSocketAddress sockAddr = UnixSocketAddress.createFileSystem(
                    DEBUGGERD_SOCKET_PATH);
            Os.bind(serverFd, sockAddr);
            Os.listen(serverFd, 1);
            Os.chmod(DEBUGGERD_SOCKET_PATH, 0777);            while (true) {
                FileDescriptor peerFd = null;                try {                    if (MORE_DEBUG) Slog.v(TAG, "Waiting for debuggerd connection");
                    peerFd = Os.accept(serverFd, null /* peerAddress */);                    if (MORE_DEBUG) Slog.v(TAG, "Got debuggerd socket " + peerFd);                    if (peerFd != null) {                        // 
                        consumeNativeCrashData(peerFd);
                    }             // ...
        }

作者：kevinsong0810
链接：https://www.jianshu.com/p/f39e9265ea66

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

慕哥9229398

手记
篇

粉丝

198

获赞与收藏

911

关注作者，订阅最新文章

阅读免费教程

Android 入门教程

59个小节 41375 439

Android Studio 编辑器教程

40个小节 11082 228

后端通用面试教程

41个小节 28740 323

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空

Android P上Java Crash、Native Crash的异常处理流程学习

一. Java Crash

二. Native Crash

阅读免费教程