3.1 Basic definition
Given a directed weighted graph G = (E, V), V is the set of nodes, and E is the set of edges. As shown in Fig. 1, a node p ∈ V represents a POI. Each POIp has category attributes Catp (such as church, museum, beach), longitude, and latitude, and the value on the node p represents the score of POIp. C represents a collection of categories of all POIs. In the attribute (ci, Di) of the node pi, ci represents the category attribute of the POI, and Di represents the distance of the POI. Each directed edge (pi, pj) represents a feasible route between two POIs, the number of edges is |E|, and the weight on the edge represents the travel time (in h) of the continuous access to the two POIs.
A travel route is a sequence consisting of multiple travel POIs, denoted as R = {p1, p2, …, pN}, where pi is the tourist location included in the route, and N is the number of locations.
The travel time between two POIs. The travel time required by the user from \( {POI}_{P_x} \) to \( {POI}_{P_{\begin{array}{l}y\\ {}\end{array}}} \) can be defined as follows:
$$ {T}^{Travel}\left({p}_x,{p}_y\right)= Dist\left({p}_x,{p}_y\right)/ speed $$
(1)
where Dist(px, py) represents the distance between px and py, which is calculated by the Haversine formula [20]. Suppose the user walks to play and takes a speed of 6 km/h.
The preference vector of a user u is expressed as IntP(u) = 〈Int(u, c1), Int(u, c2), …, Int(u, ci)〉, where Int(u, ci) represents the degree of preference of the user u for the POI category ci.
Given a user u and the POI collection he/she has been to define his/her historical travel footsteps in chronological order \( {S}_u=\left(\left({\mathrm{p}}_1,{t}_{p_1}^a,{t}_{p_1}^d\right),\left({p}_2,{t}_{p_2}^a,{t}_{p_2}^d\right)\dots, \left({p}_n,{t}_{p_n}^a,{t}_{p_n}^d\right)\right) \). Each triplet \( \left({p}_x,{t}_{p_x}^a,{t}_{p_x}^d\right) \) consists of a \( {POI}_{P_x} \) that the user has visited, a time \( {t}_{p_x}^a \) that reaches px, and a time \( {t}_{p_x}^d \) that leaves px, consisting of three elements. The first photo taken by the user in each POI is the time of the user’s arrival and the last photo is the time of the user’s departure. The user’s access time in px (that is, the user u’s stay in px) can be valued by the difference between \( {t}_{p_x}^a \) and \( {t}_{p_x}^d \). Similarly, for the travel sequence Su, \( {t}_{p_1}^a \) and \( {t}_{p_n}^d \) represent the start and end times of the journey, respectively. For simplicity, this paper represents \( {S}_u=\left(\left({p}_1,{t}_{p_1}^a,{t}_{p_1}^d\right),\left({p}_2,{t}_{p_2}^a,{t}_{p_2}^d\right)\dots, \left({p}_n,{t}_{p_n}^a,{t}_{p_n}^d\right)\right) \) as Su = (p1, p2, …, pn).
The travel footprint Su = (p1, p2, …, pn) of the user u is known, and if \( {t}_{p_{x+1}}^a-{t}_{p_x}^d>\tau \), Su is divided into a number of individual travel sequences (that is, sub-sequences of Su). In other words, if the time between consecutive accesses of two POIs is greater than the threshold τ, the travel footprint is divided into a number of different tourist subsequences. The time threshold value τ is selected in this document as 8h.
This paper gives a POI scoring method considering location distance and user preferences. The score for \( {POI}_{p_i} \) is expressed as score(pi):
$$ score\left({p}_i\right)=\alpha \times Int\left(u,{c}_i\right)+\left(1-\alpha \right)D\left({p}_i\right) $$
(2)
where ci is the category of \( {POI}_{p_i} \), D(pi) is the distance of \( {POI}_{p_i} \), and α is the user adjustment parameter, which is used to adjust the proportion of user interest preference and POI distance in the route.
3.2 Tourist route recommendation framework
As shown in Fig. 2, the travel route recommendation in this paper is divided into the construction of the POI association graph and the learning of the user’s interest preference, as well as the route recommendation. The construction of the POI association graph and the learning of the user’s interest preference are performed offline, and the distance of the POI and the user’s interest preference can be obtained by analyzing the photos taken by the user. The route recommendation is online, assuming that the user wants to go to the city with mPOIs, recorded as P = {p1, p2, …, pm}. According to the POI set P, time budget B, starting point \( {POI}_{p_1} \), and ending point \( {POI}_{p_n} \), the route with the highest score is recommended to users by using the proposed algorithm which combines user interest preferences and POI distance based on the orientation problem.
3.3 Construction of POI correlation graph
The construction of the POI association diagram takes place offline. In this paper, POI in the tourism sequence of all users is used as the node in the graph, which represents the tourism place, and the continuous access of users in the tourism sequence generates the edges in the graph.
The structure of the photo data shared by the user is (PhotoID, UserID, time, longitude, latitude, category). From this structure, the photo data contains the exact space-time location information of the user. Based on the longitude and latitude of each photo, the Haversine formula [20] is used to calculate the distance between each photo shared by the user and each POI in the city visited. If the distance is less than 200m, the photograph is considered to be taken at POI, so as to obtain the list of users’ POI Su = (p1, p2, …, pn).
A time-based user interest preference is presented by using the user’s historical travel footprint. When a user goes to a POI to play, he/she will stay at the POI for a certain period of time. The access time (i.e., the stay time) of each POI that each user has visited is calculated from the historical travel footprint of all users according to definition 4, so that the average time required for any user to access any one of the POIs can be calculated. In the travel route recommendation in this article, \( \overline{V}(p) \) is used to indicate the average access time of his/her POIp for any user. The average access time required for each POIp is as follows:
$$ \overline{V}(p)=\frac{1}{n}\sum \limits_{u\in U}\sum \limits_{p_x\in {S}_u}\left({t}_{p_x}^d-{t}_{p_x}^a\right)\sigma \left({p}_x=p\right);\forall p\in P $$
(3)
where U represents all users, and n represents the number of users accessing p in U. \( \sigma \left({p}_x=p\right)=\left\{\begin{array}{c}1,{p}_x=p\\ {}0,\mathrm{other}\end{array}\right. \).
However, the average access time of the user at each POI does not truly reflect his/her degree of interest preference for this type of POI. Therefore, a time-based user interest preference is proposed in this paper. The preference degree of user u to category attribute c of POI is calculated from the following equation.
$$ Int\left(u,c\right)=\sum \limits_{p_x\in {S}_u}\frac{\left({t}_{p_x}^d-{t}_{p_x}^a\right)}{\overline{V}\left({p}_x\right)}\sigma \left({Cat}_{p_x}=c\right);\forall c\in C $$
(4)
where Catp represents the category attribute of POIp,\( \sigma \left({Cat}_{p_x}=c\right)=\left\{\begin{array}{c}1,{Cat}_{p_x}=c\\ {}0, other\end{array}\right. \). The above equation determines the interest of user u in the category attribute c for a particular POI. Relative to the average access time of all users in the same POI, it is calculated based on the time spent by users in each POI with category attributec. In other words, a user may spend more time accessing the type of POI he or she is interested in, which in turn determines the user’s level of interest in such POI.
3.4 Proposed algorithm
Orienteering problem (OP) is a directional problem, which is described as follows. In a directed weighted graph G(V,E), V is the set of all points on the graph, and E is the set of all edges on the graph. Each point has its score (score, which can be expressed as gain), and each edge has its weight (weight, which is the walking time between two points). The start and end points are specified. Select partial points from G. Then, plan a path through the selected points, the starting points, and ending points. At the same time, under the premise of not exceeding a certain time budget, the total weight score of the path is maximized.
OP has been widely used in travel route recommendations. The route recommendation algorithm is proposed in consideration of POI distance and user interest based on the orientation problem. Based on the set P, the time budget B, the starting point \( {POI}_{p_1} \), and the end point \( {POI}_{p_n} \), the proposed algorithm recommends a route R = {p1, p2, …, pn} that satisfies the time budget B and has the highest score. The time budget is calculated by function Cost(px, py), \( Cost\left({p}_x,{p}_y\right)={T}^{Travel}\left({p}_x,{p}_y\right)+\overline{V}\left({p}_y\right) \). Therefore, the travel route recommendation model in this paper can satisfy the integer programming problem with multiple constraints, which is expressed as follows:
$$ \mathit{\operatorname{Max}}\sum \limits_{i=2}^{N-1}\sum \limits_{j=2}^N{x}_{i,j} score\left({p}_i\right) $$
(5)
where xi, j = 1 indicates the route from i to j, which goes through the edge (pi, pj), otherwise xi, j = 0. The above equation satisfies the following constraints:
$$ \sum \limits_{j=2}^N{x}_{1,j}=\sum \limits_{i=1}^{N-1}{x}_{i,N}=1 $$
(6)
$$ \sum \limits_{j=2}^N{x}_{k,j}=\sum \limits_{i=1}^{N-1}{x}_{i,k}\le 1;\forall k=2,3,\dots, N-1 $$
(7)
$$ \sum \limits_{i=1}^{N-1}\sum \limits_{j=2}^N Cost\left(i,j\right){x}_{i,j}\le B $$
(8)
$$ 2\le {u}_i\le N $$
(9)
$$ {u}_i-{u}_j+1\le \left(N-1\right)\left(1-x\right);\forall i,j=2,3,\dots, N $$
(10)
Equation (5) is an objective function that maximizes the POI distance and user interest preferences in the recommended route. Eqs. (6) to (10) are the constraints of Eq. (5). Equation (6) ensures that the user’s travel starting point is p1 and the end point is pn. Equation (7) ensures that the user’s travel route is coherent and that each POI in the route has only been visited once. Equation (8) guarantees that the time spent by the user on the entire trip is within budget B. Assuming that ux is the location of POIx in route R, Eq. (9) and Eq. (10) ensure that there are no sub-patrol routes in the integer programming problem proposed in this paper.