ANmaRemoveNestedTags

Remove tags that are identical and nested inside each other in an HTML document, preserving values inside the internal tag.
Basically it will remove only the outside tag leaving the inside tag.
Will return the passed string with outsider tags removed.

CodeFunctionName
What is this?

Public

Tested

Original Work
Function ANmaRemoveNestedTags(FromString, Optional ForTag = "span")
    ' Remove tags that are identical and nested inside each other, preserving values inside the internal tag
    '
    ' We found companies reporting Edgar <ix in nested manner with number inside the internal tag
    ' Needs VBInstr(), CutString()
    Rett = FromString
    Starting = 1
    TT1 = " <" & ForTag '& " "
    TT2 = " </" & ForTag
    TT3 = " >"
    Rett1 = CutString(FromString, , TT1)
    Do
        Tag1 = CutString(FromString, TT1, TT2, Starting)
        Tag2 = CutString(Tag1, TT3)
        If Tag1 = FromString Then Exit Do
        If Tag2 = FromString Then Exit Do
        Tag3 = VBInstr(TT1, Tag1) ' we should not have starting tag inside my tag. if we do, then we found nested tag
        If Tag3 = 0 Then '    If Tag2 > "" Then
            ' This is not nested
            ' Remove closer of nested tag
            Tag4 = CutString(FromString, TT2, TT1, Starting)
            Tag4 = CutString(Tag4, TT2, TT1, 1)
            Rett1 = Rett1 & TT1 & Tag1 & TT2 & Tag4
        Else
            ' This is nested, remove internal tag
            Rett1 = Rett1 ' & CutString(FromString, TT2, TT1, Starting)
        End If
        Starting = VBInstr(TT1, FromString, Starting + 1)
        If Starting = 0 Then
            Rett = Rett1 ' & CutString(FromString, TT2, , Starting)
            Exit Do
        End If
        DoEvents
    Loop
    ANmaRemoveNestedTags = Rett
End Function

FromString, Optional ForTag = "span"

Views 134

Downloads 65

CodeID
DB ID