3/15/2023 0 Comments Pdf extract textLoad the existing PDF document PdfLoadedDocument m_loadedDocument = new PdfLoadedDocument ( stream ) //Get the first page of the loaded PDF document PdfPageBase page = m_loadedDocument. You can get single character and its properties by using TextGlyph. You can download a complete working sample from GitHub. Text // Gets Glyph details of the word List textGlyphCollection = textWord. FontStyle // Gets the word string wordText = textWord. FontSize //Gets font style of the word FontStyle wordFontStyle = textWord. FontName //Gets size of the word float wordFontSize = textWord. Bounds //Gets font name of the word string wordFontName = textWord. WordCollection //Gets word from the collection using index TextWord textWord = textWordCollection //Gets bounds of the word RectangleF WordBounds = textWord. TextLine //Gets collection of the words in the line List textWordCollection = line. ExtractText ( out lineCollection ) //Gets specific line from the collection TextLine line = lineCollection. Pages TextLineCollection lineCollection = new TextLineCollection () //Extract text from the first page string m_extractedText = page. You can get the single word and its properties by using TextWord. Text //Gets collection of the words in the line List textWordCollection = line. FontStyle //Gets text in the line string text = line. FontSize //Gets font style of the text FontStyle fontStyle = line. FontName //Gets the size of the text float fontSize = line. Bounds //Gets the font of the text string fontName = line. TextLine //Gets bounds of the line RectangleF lineBounds = line. Please refer the following code snippet to extract the text with layout. In this method, the text is extracted in the layout as it is viewed in the reader application. You can extract text from the given PDF page based on its layout using ExtractText(bool) overload. Working with layout based text extraction Empty // Extract text from existing PDF document pages foreach ( PdfLoadedPage loadedPage in loadedPages ) //Close the document loadedDocument. Pages TextLineCollection lineCollection = new TextLineCollection () string extractedText = string. GetManifestResourceStream ( "" ) PdfLoadedDocument loadedDocument = new PdfLoadedDocument ( docStream ) // Loading page collections PdfLoadedPageCollection loadedPages = loadedDocument. Load the file as stream Stream docStream = typeof ( App ).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |