Long-term temporal modeling for egocentric video action segmentation