Deep Video Understanding With Model Efficiency And Sparse Active Labeling