Skip to content

Implementing completely offline use of tesseract.exe resulted in an "Error opening data file"/ eng. traineddata "issue #970

@22480

Description

@22480

Describe the bug:
I want to achieve fully offline use of tesseract.js, so I will:

const recognizeText = async (imageUrl: string) => {
const worker = await Tesseract.createWorker("chi_sim", undefined, {
workerPath: "/tessdata/tesseract.js/dist/worker.min.js",
corePath: "/tessdata/tesseract.js-core",
langPath: "/tessdata/tesseract-lang",
logger: m => console.log(m),
})

    const {
        data: { text },
    } = await worker.recognize(imageUrl)
    setRecognizedText(text)

    await worker.terminate()
}

To Reproduce:
Steps to reproduce the behavior:

  1. Create a tessdata folder in the public folder
  2. Place local resource files in this folder:
    tesseract-lang、tesseract.js、tesseract.js-core
  3. Run

Complete code
const inputRefOCR = useRef(null)
const [imageData, setImageData] = useState("")
const [recognizedText, setRecognizedText] = useState("")

const handleCapture = () => {
    if (inputRefOCR.current.files && inputRefOCR.current.files.length > 0) {
        const file = inputRefOCR.current.files[0]
        const reader = new FileReader()
        reader.onload = e => {
            setImageData(e.target.result)
            recognizeText(e.target.result)
        }
        reader.readAsDataURL(file)
    }
}

const recognizeText = async (imageUrl: string) => {
    const worker = await Tesseract.createWorker( {
        workerPath: "/tessdata/tesseract.js/dist/worker.min.js",
        corePath: "/tessdata/tesseract.js-core",
        langPath: "/tessdata/tesseract-lang",
        logger: m => console.log(m),
    })


    const {
        data: { text },
    } = await worker.recognize(imageUrl)
    setRecognizedText(text)

    await worker.terminate()
}

Console error display:
屏幕截图 2024-10-28 144514

Expected behavior:
Implement fully offline use of Tesseract.js

Device Version:

  • Windows 11
  • chrome

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions