TableStructureModel Initialization Fails: "Cannot Copy Out Of Meta Tensor" When Using CPU Device

by ADMIN 97 views

Introduction

When processing PDFs with table structure extraction enabled, TableStructureModel initialization fails with the error: "Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device." This issue occurs specifically in the TFPredictor when trying to load the TableFormer model, and it happens when using CPU as the accelerator device. In this article, we will explore the steps to reproduce this issue, the full error message, and the suggested fix.

Steps to Reproduce

To reproduce this issue, follow these steps:

  1. Create a DocumentConverter with table structure extraction enabled: Create a DocumentConverter instance with table structure extraction enabled by setting the do_table_structure option to True.
  2. Set accelerator device to 'CPU': Set the accelerator device to 'CPU' by creating an AcceleratorOptions instance with the device parameter set to AcceleratorDevice.CPU.
  3. Try to process a PDF document containing tables: Try to process a PDF document containing tables using the convert method of the DocumentConverter instance.
  4. The process fails with the PyTorch error about meta tensors: The process will fail with the PyTorch error about meta tensors.

Sample Code to Reproduce

Here is a sample code snippet that reproduces this issue:

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import AcceleratorDevice, AcceleratorOptions, PdfPipelineOptions

# Configure options with table structure enabled
pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.do_table_structure = True # The error is very much here
pipeline_options.table_structure_options.do_cell_matching = True
pipeline_options.accelerator_options = AcceleratorOptions(num_threads=4, device=AcceleratorDevice.CPU)

# Create converter
converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
    }
)

# Try to process a PDF - this will fail
result = converter.convert("path/to/pdf_with_tables.pdf")

Docling Version

The docling version used to reproduce this issue is 2.30.0.

Python Version

The Python version used to reproduce this issue is 3.12.7.

Full Error Message

Here is the full error message:

ERROR - Error processing Aula_Kanban in Practice.pdf: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
Traceback (most recent call last):
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/docling_ibm_models/tableformer/data_management/tf_predictor.py", line 178, in _load_model
    model = TableModel04_rs(self._config, self._init_data, self._device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/docling_ibm_models/tableformer/models/table04_rs/tablemodel04_rs.py", line 40, in __init__
    self._encoder = Encoder04(self._enc_image_size, self._encoder_dim).to(device)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
[...]
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/torch/nn/modules/module.py", line 942, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1348, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Environment Information

Here is the environment information:

  • OS: Linux (Tested on windows and got the same error)
  • Using PyTorch with CPU: Yes, PyTorch is being used with CPU (CUDA not available or not properly configured)
  • Docling is using newer PyTorch meta device features but seems to have compatibility issues when initializing models on CPU: Yes, docling is using newer PyTorch meta device features but seems to have compatibility issues when initializing models on CPU.

Suggested Fix

The suggested fix is to update the TableModel04_rs class in docling_ibm_models/tableformer/models/table04_rs/tablemodel04_rs.py to use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when initializing the model on CPU devices.

Q: What is the issue with TableStructureModel initialization when using CPU device?

A: The issue is that the TableStructureModel initialization fails with the error: "Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device." This occurs specifically in the TFPredictor when trying to load the TableFormer model, and it happens when using CPU as the accelerator device.

Q: What are the steps to reproduce this issue?

A: To reproduce this issue, follow these steps:

  1. Create a DocumentConverter with table structure extraction enabled: Create a DocumentConverter instance with table structure extraction enabled by setting the do_table_structure option to True.
  2. Set accelerator device to 'CPU': Set the accelerator device to 'CPU' by creating an AcceleratorOptions instance with the device parameter set to AcceleratorDevice.CPU.
  3. Try to process a PDF document containing tables: Try to process a PDF document containing tables using the convert method of the DocumentConverter instance.
  4. The process fails with the PyTorch error about meta tensors: The process will fail with the PyTorch error about meta tensors.

Q: What is the full error message?

A: Here is the full error message:

ERROR - Error processing Aula_Kanban in Practice.pdf: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
Traceback (most recent call last):
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/docling_ibm_models/tableformer/data_management/tf_predictor.py", line 178, in _load_model
    model = TableModel04_rs(self._config, self._init_data, self._device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/docling_ibm_models/tableformer/models/table04_rs/tablemodel04_rs.py", line 40, in __init__
    self._encoder = Encoder04(self._enc_image_size, self._encoder_dim).to(device)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
[...]
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/torch/nn/modules/module.py", line 942, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/torch//modules/module.py", line 1348, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Q: What is the suggested fix?

A: The suggested fix is to update the TableModel04_rs class in docling_ibm_models/tableformer/models/table04_rs/tablemodel04_rs.py to use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when initializing the model on CPU devices.

Q: Why is this issue occurring?

A: This issue is occurring because docling is using newer PyTorch meta device features but seems to have compatibility issues when initializing models on CPU devices.

Q: How can I resolve this issue?

A: To resolve this issue, you can update the TableModel04_rs class to use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when initializing the model on CPU devices.

Q: What are the environment requirements for this issue?

A: The environment requirements for this issue are:

  • OS: Linux (Tested on windows and got the same error)
  • Using PyTorch with CPU: Yes, PyTorch is being used with CPU (CUDA not available or not properly configured)
  • Docling is using newer PyTorch meta device features but seems to have compatibility issues when initializing models on CPU: Yes, docling is using newer PyTorch meta device features but seems to have compatibility issues when initializing models on CPU.