Method for realizing task data decoupling in spark operation scheduling system

2014 
The invention relates to a method for realizing task data decoupling in a spark operation scheduling system, wherein the method comprises the following steps that in one iteration cycle, a system reads the iteration RDD (resilient distributed datasets) information of an iteration state object through a task context object example, and in addition, the iteration RDD information is stored into a task context object; the system finds the corresponding RDD information from the task context object through a Spark task object example, and stores the corresponding RDD information into a task result object; the system analyzes the RDD information in the task result object through the task state object example, and respectively stores the corresponding RDD information into the corresponding state object. When the method for realizing task data decoupling in the spark operation scheduling system is adopted, the RDD can be transmitted among all tasks, or the RDD transmission can be carried out between a former period and a later period of the task, so that each task can be complied in a modularized mode, and a wider application range can be realized.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []