this dataset contains 19536k screenshot image and ocr text pairs that were augmented 2 times via groq for text and the Albumenation image augmentation library in python test_19536k_aug_2_groq_all