Using AI for image transcripts, yay or nay?

Gonzako@lemmy.world · 1 month ago

Using AI for image transcripts, yay or nay?

Rain World: Slugcat Game@lemmy.world · 6 days ago

i used tesseract and then proofread it for text images(???????)

originalucifer@moist.catsweat.com · 1 month ago

personally, this is the kind of laser focused tooling its good for. LLMs are going to be critical to assisting the disabled in many contexts.

hendrik@palaver.p3x.de · 1 month ago

I’d ask someone who needs these transcriptions first. I tend more towards “Nay”. I mean if they want AI transcriptions, I guess they could just run their own AI. And that way they get to choose between human and AI ones. I’m kind of against flooding the internet with AI content as long as the recipients can do it themselves.

x74sys@programming.dev · 1 month ago

In my opinion, no. It has to be heavily curated. You’re not saving yourself a lot of work if you have to read it word by word (and probably correct stuff) anyway.

I think just one very short sentence describing what’s on there (it doesn’t have to be detailed) is a lot better than whatever an LLM will give you.

Kierunkowy74@piefed.zip · 1 month ago

Check your output as it may be less accurate than your effort.

AI is able to extensively describe a photo, like these published on !pics@lemmy.world , but fails at seeing, what part of it is actually important, or recognising a point of a meme. It will save you many keystrokes, but probably will still need to be manually corrected.

Doorknob@lemmy.world · 1 month ago

By transcribing, do you mean describing what is in a picture, or transcribing text in a picture?

For the former, I can’t really imagine an image you couldn’t describe for accessibility within a sentence, and for the latter, OCR could do the job equally well.

I’m not saying this to just push the view that neural networks are no good for anything btw. For translation, for example, or text to speech/speech to text, I genuinely think they’re a revelation, and they need very little compute to perform those functions.

Rain World: Slugcat Game@lemmy.world · 6 days ago

Auster@thebrainbin.org · 1 month ago

Imo it’s a good use. But do make sure you read the outputs throughly. Even hand-made OCR tools can go crazy some times. Also if the AI can be fully offline / self-hosted, that’s even better imo.

placebo@lemmy.zip · 1 month ago

AI is great for this. We shouldn’t put people with disabilities at a disadvantage because of the anti-AI hysteria.

quediuspayu@lemmy.dbzer0.com · 1 month ago

If you can run it your computer for a job that you would do anyway, I don’t see why not

qaz@lemmy.world · 1 month ago

I’d say go ahead but make sure it produces accurate enough results and make sure to add something like [AI Transcribed] in front so people can take the potential for additional errors into consideration when reading it.

Also, if you’re using an online service make sure you’re using something that doesn’t use it as training data. Many (probably almost all) artists / photographers won’t appreciate that.

vala@lemmy.dbzer0.com · 1 month ago

You have a unique advantage in using AI for this over a vision impaired person. That being that if the generated text is wrong, you know and can correct it.

pruwyben@discuss.tchncs.de · 1 month ago

*yea or nay

HubertManne@piefed.social · 24 days ago

wouldn’t using standard ocr be easier?

KatherinaReichelt@feddit.org · 1 month ago

I think that technology can really help us here. OCR on images is mostly solved. If you know what PaddleOCR can do, those people on Mastodon who are whining about others not including an image description for a screenshot seem really annoying. It is possible to do this directly on your computer without any costs, without the need for beefy hardware. So no need to try to force everyone else to include transcriptions for screenshot, no need to attack other people, just do it yourself and enjoy the text on the screenshot. Technology can really help us here.

This also does kind of apply to AI image descriptions. Try it and put an image into Gemini and ask it to describe it. You will be surprised. AI can totally give you a workable description of an image. The problem here is that those AI tools can get quite expensive when you are using them a lot and that many disabled people do not have much money. So in my opinion it totally is ok to include AI image descriptions.

I think that there are too many people in the fediverse who do not know the current state of the technology and hate AI for maybe the right reasons, but who are missing out how it could help them.

Tamlyn@lemmy.zip · 1 month ago

A lot artists doesn’t want that their art is used on ai. You can’t prevent that if you let ai summarize your images. So don’t use ai for that

Gonzako@lemmy.world · 1 month ago

I was actually thinking of using a self-hosted LLM for these tasks. I wanna dig again into it and I got access to computers on the cheap

Rain World: Slugcat Game@lemmy.world · 6 days ago

LLMs can’t see. are you talking about one with a vision thingamabob bolted on? you could just use ocr