Error: cannot import name 'PDFDocument' from 'pdfminer.pdfparser'


It seems you're encountering an issue with importing PDFDocument from pdfminer.pdfparser. Let's rephrase your request for clarity and correctness: "I need to extract text from PDF files, and I've been successful using pdfminer.six to extract both text paragraphs and tables. 
However, I'm now encountering an error related to the line 
 
ImportError: cannot import name 'PDFDocument' from 'pdfminer.pdfparser' (C:\Users\ashok\python\lib\site-packages\pdfminer\pdfparser.py) 
I've attempted to uninstall and reinstall Anaconda, pdfminer.six, and other related packages multiple times. About 2-3 days ago, the code suddenly worked, but now the error has resurfaced. Since I'm using Windows 10, I also tried using Linux Ubuntu but encountered the error.
After struggling for hours, I've managed to fix the problem. In this post, we're going to discuss possible solutions for this error.

We made the following changes to address the issue:

We updated our import statements as follows:

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage

Additionally, we adjusted the instantiation of the PDFDocument object to include the PDFParser:

parser = PDFParser(pdf_file)
doc = PDFDocument(parser)

Furthermore, we modified the loop to create pages using the PDFPage module:

for page in PDFPage.create_pages(doc):

It's important to note that according to the pdfminer documentation, the PDFDocument should be imported from pdfminer.pdfdocument.

By correctly importing the required modules and adjusting the instantiation of the PDFDocument object, we ensure compatibility and proper functioning of the code.

Solution 2:

To fix the "Error: cannot import name 'PDFDocument' from 'pdfminer.pdfparser'" error, you can follow these steps:

  1. Check PDFMiner Version: Ensure that you are using a compatible version of PDFMiner. The PDFDocument class might not be available in the version you are using.
  2. Update PDFMiner: If you are not using the latest version of PDFMiner, update it to the latest version. You can do this using pip:
  3. pip install pdfminer.six --upgrade
  4. Correct Import Statement: Make sure you are importing the PDFDocument class from the correct module. Here's an example of the correct import statement:
  5. from pdfminer.pdfparser import PDFDocument
  6. Verify Installation: After updating, verify that PDFMiner is installed correctly in your Python environment. You can check installed packages using:
  7. pip list
  8. Check Python Path: Ensure that Python can locate the pdfminer package correctly. If necessary, adjust your Python path settings.
  9. Reinstallation: If the issue persists, try uninstalling and reinstalling PDFMiner:
  10. pip uninstall pdfminer.six
    pip install pdfminer.six

By following these steps, you should be able to resolve the "Error: cannot import name 'PDFDocument' from 'pdfminer.pdfparser'" error.