首先需要 Kanna 第三方库解析 HTML。
解析完成后,再根据 XPath 找到指定想要展示的内容进行展示。这里展示网站的名称和 favicon。
代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
| import SwiftUI import Kanna
struct ContentView: View { @State private var imageUrl: String = "" @State private var webTitle: String = "" var body: some View { VStack { Text(webTitle) AsyncImage(url: URL(string: imageUrl)) { image in image .resizable() .scaledToFit() .frame(width: 40, height: 40) .clipped() .cornerRadius(4) } placeholder: { ProgressView() } } .onAppear { let url = URL(string: "https://wonderhoi.com") var appleTouchIconString: String? if UIApplication.shared.canOpenURL(url!) { let task = URLSession.shared.dataTask(with: url!) { data, response, error in guard error == nil else { print(error!) return } guard let data = data else { print("data is nil") return } guard let html = String(data: data, encoding: .utf8) else { print("the response is not in UTF-8") return } if let doc = try? HTML(html: html, encoding: .utf8) { webTitle = doc.title ?? "None" for appleTouchIcon in doc.xpath("//meta[@rel = 'apple-touch-icon']/@content | //link[@rel = 'apple-touch-icon']/@href | //meta[@rel = 'apple-touch-icon']/@href | //link[@rel = 'apple-touch-icon']/@content") { appleTouchIconString = appleTouchIcon.text let iconUrl = URL(string: appleTouchIconString!) if UIApplication.shared.canOpenURL(iconUrl!) { let task = URLSession.shared.dataTask(with: iconUrl!) { data, response, error in guard error == nil else { print(error!) return } guard let data = data else { print("data is nil") return } guard let image = UIImage(data: data) else { print("no picture") return } imageUrl = appleTouchIconString! } task.resume() } } } } task.resume() } } } }
|
其中,XPath 语句
1
| //meta[@rel = 'apple-touch-icon']/@content | //link[@rel = 'apple-touch-icon']/@href | //meta[@rel = 'apple-touch-icon']/@href | //link[@rel = 'apple-touch-icon']/@content
|
以 //meta[@rel = 'apple-touch-icon']/@content
为例:
- **//**:选中节点的标记符号
- meta:节点的标记名称
- **@**:选中属性的标记符号
- rel:节点属性的名称
- **/@**:提取当前路径下的属性值
参考:
- XPath在python中的高级应用
- selenium之xpath语法总结
- An SEO’s guide to XPath
另外,还可以通过 //meta[@property = 'og:image']/@content
获取网站的 ogImage。