Spaces:
Running
on
Zero
Running
on
Zero
Output of ocrmypdf is never used
#2
by
Didier
- opened
The output of ocrmypdf.ocr() is never used...
An output file is created, not subsequently used.
out_pdf_file = input_file.replace(".pdf", "_ocr.pdf")
ocrmypdf.ocr(input_file, out_pdf_file, force_ocr=True)
text = extract_text_from_pdf(PdfReader(input_file)) # <--- text extracted from original file instead of out_pdf_file