To solve the error "Excel file format cannot be determined, you must specify an engine manually" when using Pandas and glob to read Excel files, follow these steps:
- Install Required Libraries: Ensure that you have Pandas installed in your Python environment.
- Check File Extensions: Ensure that all the Excel files you're trying to read have valid file extensions (.xls or .xlsx).
- Specify Engine Manually: When reading Excel files using Pandas, explicitly specify the engine parameter as 'openpyxl' or 'xlrd' depending on the version of Excel files you are working with.
- Use glob to Read Multiple Files: If you're using glob to read multiple Excel files, ensure that you pass the correct file paths to the read_excel function and specify the engine parameter as shown above.
- Upgrade Pandas and Dependencies: If you're still encountering issues, consider upgrading Pandas and its dependencies to the latest versions.
- Check File Integrity: Make sure that the Excel files you're trying to read are not corrupted or damaged.
When an Excel file is opened, for example, by MS Excel, a hidden temporary file is created in the same directory:
~$datasheet.xlsx
So, when I run the code to read all the files from the folder, it gives me the error:
Excel file format cannot be determined; you must specify an engine manually.
When all files are closed and no hidden temporary files like ~$filename.xlsx
are present in the same directory, the code works perfectly.