The idea is to insert a span in the text layer with an aria-role set to img and use the bounding box provided by the attribute field in the tag dict in order to have non-null dimensions for the image to make it "visible".