Day 14 - Thursday, January 14, 2021

2021-01-14

There was this video I saw some days ago on Reddit where a 32-year-old programmer named Yann LeCun demonstrates "World's first Convolutional Network for Text Recognition" and I was amazed that this video was 27 years old and they had much weaker computer in terms of processing power and computation and still they were able to do this and here I am trying my best to extract text from some images with much faster computer than they had and also much better softwares and still having no luck.

I decided to learn PyTesseract (Python-tesseract is an optical character recognition (OCR) tool for python and Tesseract is an OCR engine by Google) when I had some images at work where I had to manually add data from the image to the accounting software and then just like every lazy programmer spends 8 hours trying to automate 5-minute task, I tried to automate the task by extracting text from the images and adding it accounting software with Python.

The first obstacle was to extract text from the image which when I read the pytesseract documentation example seemed pretty easy task to do but it was not especially when the image is not in good quality and some words are hard to read by even a by human.

So I spent some time readng some articles with some examples explaining how to extract text from images and but there were too many new concepts coming my way that i didn't understood so I quit it that day and left it on the future and now today again I wanted to do that same thing and for those same images and I am thinking of learning this now

What I Learned Today

💻 Programming

PyTesseract - I didn't learn all of it just started learning it and i think it will take some time, a long time, to really get the expected results from it but its okay as long as I learn somthing from it. here are some good articles i read today on extracting text from images. One, Two.

🗾 Langauge[日本語]

お土産 (おみやげ) Souvenir. 土: Soil.
億 (おく) Hundred Million.
意味 (いみ) Meaning.