A Simple HTML / CSS Parser With Objective-C

One of the biggest challenges of building ShopLater, an app that gets you the latest prices for products you love, was figuring out how to parse the HTML from a given retailer’s product page to get the product’s price, image, and title. With Ruby, I would simply use the amazing nokogiri gem, where I can simply put in a CSS selector, and it’ll find me the information between the specified tags.

After trying out the awful HPPLE library, we knew that we needed a different solutions. The problem with HPPLE is that it parses using XPATH, which is very specific. So if the retailer changed a random div on a page, our system would break. Parsing by CSS selectors is a lot more reliable, since it’s not as likely the class name for a price would change very often.

The idea for our parser came from a very simple premise. The HTML on a page is a string. And with strings, you can do things like use regular expressions and, more interestingly as we discovered, the NSScanner class. Here is our simple CSS parser:

+ (NSString *)scanString:(NSString *)string
                startTag:(NSString *)startTag
                  endTag:(NSString *)endTag
{

    NSString* scanString = @"";

    if (string.length > 0) {

        NSScanner* scanner = [[NSScanner alloc] initWithString:string];

        @try {
            [scanner scanUpToString:startTag intoString:nil];
            scanner.scanLocation += [startTag length];
            [scanner scanUpToString:endTag intoString:&scanString];
        }
        @catch (NSException *exception) {
            return nil;
        }
        @finally {
            return scanString;
        }

    }

    return scanString;

}

So, to use the above method, simply pass in your HTML string, and the tags between which your target is located, and it will return back a string between your start and end tag. So for example, if the product’s price is as follows on the page as it is on Macy’s website:

<span class="priceSale">Now $79.99</span>

You simply pass in the html string for the macy’s product page with the start and end tag:

[Parser scanString:macysHTMLString
          startTag:(NSString *)@"<span class="priceSale">"
            endTag:(NSString *)@"</span>"];

You’ll get back @”Now $79.99″ as the result string.

For more complicated HTML, where the price might be in a random span with no identifier, you simply pass in the highest level of unique start and end tags, get back a new string with the extra tags, then keep passing the result string into the scanner with the more specific tags to narrow down the item you need.

Enjoy the article? Join over 20,000+ Swift developers and enthusiasts who get my weekly updates.