本篇内容从ReceiverTracker消息通讯的角度来研究ReceiverTracker的源码
在第10篇中介绍过Receiver的启动、注册、数据汇报,接着第10篇的内容看。从ReceiverSupervisorImpl的pushAndReportBlock方法开始,代码如下
def pushAndReportBlock( receivedBlock: ReceivedBlock, metadataOption: Option[Any], blockIdOption: Option[StreamBlockId] ) { val blockId = blockIdOption.getOrElse(nextBlockId) val time = System.currentTimeMillis val blockStoreResult = receivedBlockHandler.storeBlock(blockId, receivedBlock) logDebug(s"Pushed block $blockId in ${(System.currentTimeMillis - time)} ms") val numRecords = blockStoreResult.numRecords val blockInfo = ReceivedBlockInfo(streamId, numRecords, metadataOption, blockStoreResult) trackerEndpoint.askWithRetry[Boolean](AddBlock(blockInfo)) logDebug(s"Reported block $blockId") }
向trackerEndpoint汇报AddBlock消息,blockInfo只是一个简单的case class,代码如下
private[streaming] case class ReceivedBlockInfo( streamId: Int, numRecords: Option[Long], metadataOption: Option[Any], blockStoreResult: ReceivedBlockStoreResult ) { require(numRecords.isEmpty || numRecords.get >= 0, "numRecords must not be negative") @volatile private var _isBlockIdValid = true def blockId: StreamBlockId = blockStoreResult.blockId def walRecordHandleOption: Option[WriteAheadLogRecordHandle] = { blockStoreResult match { case walStoreResult: WriteAheadLogBasedStoreResult => Some(walStoreResult.walRecordHandle) case _ => None } } /** Is the block ID valid, that is, is the block present in the Spark executors. */ def isBlockIdValid(): Boolean = _isBlockIdValid /** * Set the block ID as invalid. This is useful when it is known that the block is not present * in the Spark executors. * 当block在Executors中不存在时,将block ID 设置为无效的 */ def setBlockIdInvalid(): Unit = { _isBlockIdValid = false } }
里面没什么信息,看ReceivedBlockStoreResult,代码如下
private[streaming] trait ReceivedBlockStoreResult { // Any implementation of this trait will store a block id def blockId: StreamBlockId // Any implementation of this trait will have to return the number of records def numRecords: Option[Long] }
他只是一个接口,看他的两个子类,分别是WriteAheadLogBasedStoreResult和BlockManagerBasedStoreResult。在WriteAheadLogBasedStoreResult类中多了一个WriteAheadLogRecordHandle。
看ReceiverTrackerEndpoint中的receiveAndReply是怎样接收AddBlock消息的,代码如下
case AddBlock(receivedBlockInfo) => if (WriteAheadLogUtils.isBatchingEnabled(ssc.conf, isDriver = true)) { walBatchingThreadPool.execute(new Runnable { override def run(): Unit = Utils.tryLogNonFatalError { if (active) { context.reply(addBlock(receivedBlockInfo)) } else { throw new IllegalStateException("ReceiverTracker RpcEndpoint shut down.") } } }) } else { context.reply(addBlock(receivedBlockInfo)) }
首先判断是否采用WAL的方法保存元数据,默认为true。如果是WAL的方式存储,WAL采用了一个线程池来处理操作。两种方法最终都是调用addBlock(receivedBlockInfo)方法,addBlock的代码如下
private def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = { receivedBlockTracker.addBlock(receivedBlockInfo) }
这里什么也没做就把任务交给了receivedBlockTracker,ReceivedBlockTracker在ReceiverTrack实例化的时候被创建。看他的addBlock方法,代码如下
def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = { try { val writeResult = writeToLog(BlockAdditionEvent(receivedBlockInfo)) if (writeResult) { synchronized { getReceivedBlockQueue(receivedBlockInfo.streamId) += receivedBlockInfo } logDebug(s"Stream ${receivedBlockInfo.streamId} received " + s"block ${receivedBlockInfo.blockStoreResult.blockId}") } else { logDebug(s"Failed to acknowledge stream ${receivedBlockInfo.streamId} receiving " + s"block ${receivedBlockInfo.blockStoreResult.blockId} in the Write Ahead Log.") } writeResult } catch { case NonFatal(e) => logError(s"Error adding block $receivedBlockInfo", e) false } }
首先调用writeToLog方法,将receivedBlockInfo放到BlockAdditionEvent类中,传递进去 ,writeToLog的代码如下
private def writeToLog(record: ReceivedBlockTrackerLogEvent): Boolean = { if (isWriteAheadLogEnabled) { logTrace(s"Writing record: $record") try { writeAheadLogOption.get.write(ByteBuffer.wrap(Utils.serialize(record)), clock.getTimeMillis()) true } catch { case NonFatal(e) => logWarning(s"Exception thrown while writing record: $record to the WriteAheadLog.", e) false } } else { true } }
如果是WAL的方式,就是把record序列化后存储,返回操作结果true,否则直接返回true。
回到上面的判断if (writeResult),将receivedBlockInfo放入到getReceivedBlockQueue队列中,看一下getReceivedBlockQueue的代码
private def getReceivedBlockQueue(streamId: Int): ReceivedBlockQueue = { streamIdToUnallocatedBlockQueues.getOrElseUpdate(streamId, new ReceivedBlockQueue) }
先从streamIdToUnallocatedBlockQueues中获取ReceivedBlockQueue队列,如果没有放一个新的ReceivedBlockQueue,取到队列后将receivedBlockInfo放入队列。每一个receiver对应一个自己的队列,streamIdToUnallocatedBlockQueues的代码如下
private type ReceivedBlockQueue = mutable.Queue[ReceivedBlockInfo]private val streamIdToUnallocatedBlockQueues = new mutable.HashMap[Int, ReceivedBlockQueue]
将receivedBlockInfo放入队列后,返回writeResult(就是true或false),代表元数据被接收成功或失败。
receivedBlockInfo已经被放入到队列中了,那么在什么时候被使用了呢?我们在job的动态生成的时候好像看到过,看JobGenerator的generateJobs方法里有这样一行代码,代码如下
// allocate received blocks to batch// 分配接收到的数据给batchjobScheduler.receiverTracker.allocateBlocksToBatch(time)
看receiverTracker的allocateBlocksToBatch方法,代码如下
def allocateBlocksToBatch(batchTime: Time): Unit = { if (receiverInputStreams.nonEmpty) { receivedBlockTracker.allocateBlocksToBatch(batchTime) } }
这里调用了receivedBlockTracker的allocateBlocksToBatch(batchTime)方法,接着看allocateBlocksToBatch的代码
/** * Allocate all unallocated blocks to the given batch. * This event will get written to the write ahead log (if enabled). * 分配所有示分配的blocks给batch */def allocateBlocksToBatch(batchTime: Time): Unit = synchronized { if (lastAllocatedBatchTime == null || batchTime > lastAllocatedBatchTime) { val streamIdToBlocks = streamIds.map { streamId => (streamId, getReceivedBlockQueue(streamId).dequeueAll(x => true)) }.toMap val allocatedBlocks = AllocatedBlocks(streamIdToBlocks) if (writeToLog(BatchAllocationEvent(batchTime, allocatedBlocks))) { timeToAllocatedBlocks.put(batchTime, allocatedBlocks) lastAllocatedBatchTime = batchTime } else { logInfo(s"Possibly processed batch $batchTime need to be processed again in WAL recovery") } } else { // This situation occurs when: // 1. WAL is ended with BatchAllocationEvent, but without BatchCleanupEvent, // possibly processed batch job or half-processed batch job need to be processed again, // so the batchTime will be equal to lastAllocatedBatchTime. // 2. Slow checkpointing makes recovered batch time older than WAL recovered // lastAllocatedBatchTime. // This situation will only occurs in recovery time. logInfo(s"Possibly processed batch $batchTime need to be processed again in WAL recovery") } }
获取streamIdToBlocks:Map[Int,Seq[ReceiverBlockInfo]],从streamIdToUnallocatedBlockQueues中获取每一个receiver对应的ReceiverBlockInfo列表。
在writeToLog方法,判断如果是WAL方式,就写日志,否则直接返回true。
timeToAllocatedBlocks.put(batchTime, allocatedBlocks)这行代码将allocatedBlocks所有receiver接收的元数据按时间保存到timeToAllocatedBlocks中,然后更新lastAllocatedBatchTime
那么timeToAllocatedBlocks中的数据在什么时候被获取的,我们想一下,timeToAllocatedBlocks在job生成的时候需要填充数据,数据是在RDD中被使用的,所以猜想是在创建RDD的时候用到了timeToAllocatedBlocks。
找第一个RDD,就是BlockRDD,在ReceiverInputDStream的compute方法中找到了timeToAllocatedBlocks的使用,代码如下
override def compute(validTime: Time): Option[RDD[T]] = { val blockRDD = { if (validTime < graph.startTime) { // If this is called for any time before the start time of the context, // then this returns an empty RDD. This may happen when recovering from a // driver failure without any write ahead log to recover pre-failure data. new BlockRDD[T](ssc.sc, Array.empty) } else { // Otherwise, ask the tracker for all the blocks that have been allocated to this stream // for this batch val receiverTracker = ssc.scheduler.receiverTracker // 根据时间获取所有receiver接收数据的元数据列表 val blockInfos = receiverTracker.getBlocksOfBatch(validTime).getOrElse(id, Seq.empty) // Register the input blocks information into InputInfoTracker val inputInfo = StreamInputInfo(id, blockInfos.flatMap(_.numRecords).sum) ssc.scheduler.inputInfoTracker.reportInfo(validTime, inputInfo) // Create the BlockRDD createBlockRDD(validTime, blockInfos) } } Some(blockRDD) }
跟踪receiverTracker.getBlocksOfBatch这个方法,代码如下
def getBlocksOfBatch(batchTime: Time): Map[Int, Seq[ReceivedBlockInfo]] = { receivedBlockTracker.getBlocksOfBatch(batchTime) }
接着看getBlocksOfBatch方法
def getBlocksOfBatch(batchTime: Time): Map[Int, Seq[ReceivedBlockInfo]] = synchronized { timeToAllocatedBlocks.get(batchTime).map { _.streamIdToAllocatedBlocks }.getOrElse(Map.empty) }
终于看到了timeToAllocatedBlocks被使用
再看一个ReceiverTracker的stop方法
代码如下
def stop(graceful: Boolean): Unit = synchronized { if (isTrackerStarted) { // First, stop the receivers trackerState = Stopping if (!skipReceiverLaunch) { // Send the stop signal to all the receivers endpoint.askWithRetry[Boolean](StopAllReceivers) // Wait for the Spark job that runs the receivers to be over // That is, for the receivers to quit gracefully. receiverJobExitLatch.await(10, TimeUnit.SECONDS) if (graceful) { logInfo("Waiting for receiver job to terminate gracefully") receiverJobExitLatch.await() logInfo("Waited for receiver job to terminate gracefully") } // Check if all the receivers have been deregistered or not val receivers = endpoint.askWithRetry[Seq[Int]](AllReceiverIds) if (receivers.nonEmpty) { logWarning("Not all of the receivers have deregistered, " + receivers) } else { logInfo("All of the receivers have deregistered successfully") } } // Finally, stop the endpoint ssc.env.rpcEnv.stop(endpoint) endpoint = null receivedBlockTracker.stop() logInfo("ReceiverTracker stopped") trackerState = Stopped } }
向endpoint发送一条停止所有receiver的消息StopAllReceivers,看接收到消息是怎样处理的,代码如下
case StopAllReceivers => assert(isTrackerStopping || isTrackerStopped) stopReceivers() context.reply(true)
接着看stopReceivers()方法,代码如下
private def stopReceivers() { receiverTrackingInfos.values.flatMap(_.endpoint).foreach { _.send(StopReceiver) } logInfo("Sent stop signal to all " + receiverTrackingInfos.size + " receivers") }
向每一个receiver发送一条StopReceiver消息,看ReceiverSupervisorImpl中的endpoint接收消息后的逻辑代码
case StopReceiver => logInfo("Received stop signal") ReceiverSupervisorImpl.this.stop("Stopped by driver", None)
调用了ReceiverSupervisorImpl的stop方法,stop方法代码如下
def stop(message: String, error: Option[Throwable]) { stoppingError = error.orNull stopReceiver(message, error) onStop(message, error) futureExecutionContext.shutdownNow() stopLatch.countDown() }
先看stopReceiver方法,代码如下
def stopReceiver(message: String, error: Option[Throwable]): Unit = synchronized { try { logInfo("Stopping receiver with message: " + message + ": " + error.getOrElse("")) receiverState match { case Initialized => logWarning("Skip stopping receiver because it has not yet stared") case Started => receiverState = Stopped receiver.onStop() logInfo("Called receiver onStop") onReceiverStop(message, error) case Stopped => logWarning("Receiver has been stopped") } } catch { case NonFatal(t) => logError("Error stopping receiver " + streamId + t.getStackTraceString) } }
第一调用了receiver的stop方法receiver.onStop(),看一下KafkaReceiver的onStop()方法,代码如下
def onStop() { if (consumerConnector != null) { consumerConnector.shutdown() consumerConnector = null } }
关闭了consumer的连接,就是停止接收数据
第二调用onReceiverStop,看ReceiverSupervisor的子类ReceiverSupervisorImpl的onReceiverStop方法,代码如下
override protected def onReceiverStop(message: String, error: Option[Throwable]) { logInfo("Deregistering receiver " + streamId) val errorString = error.map(Throwables.getStackTraceAsString).getOrElse("") trackerEndpoint.askWithRetry[Boolean](DeregisterReceiver(streamId, message, errorString)) logInfo("Stopped receiver " + streamId) }
向trackerEndpoint发送了一条 注销receiver的消息DeregisterReceiver。
再看onStop()方法,在ReceiverSupervisor的子类ReceiverSupervisorImpl的onReceiverStop方法,代码如下
override protected def onStop(message: String, error: Option[Throwable]) { registeredBlockGenerators.foreach { _.stop() } env.rpcEnv.stop(endpoint) }
调用了每一个BlockGenerator的stop方法,stop方法代码如下
def stop(): Unit = { // Set the state to stop adding data synchronized { if (state == Active) { state = StoppedAddingData } else { logWarning(s"Cannot stop BlockGenerator as its not in the Active state [state = $state]") return } } // Stop generating blocks and set the state for block pushing thread to start draining the queue logInfo("Stopping BlockGenerator") blockIntervalTimer.stop(interruptTimer = false) synchronized { state = StoppedGeneratingBlocks } // Wait for the queue to drain and mark generated as stopped logInfo("Waiting for block pushing thread to terminate") blockPushingThread.join() synchronized { state = StoppedAll } logInfo("Stopped BlockGenerator") }
主要是定时器的停止blockIntervalTimer.stop,blockIntervalTimer在上一讲有具体的作用讲解
ReceiverTracker的其他消息,以后再继
作者:海纳百川_spark
链接:https://www.jianshu.com/p/43c890d623e2
共同学习,写下你的评论
评论加载中...
作者其他优质文章