### 3.1 Basic definition

Given a directed weighted graph *G* = (*E*, *V*), *V* is the set of nodes, and *E* is the set of edges. As shown in Fig. 1, a node *p* ∈ *V* represents a *POI*. Each *POI*_{p} has category attributes *Cat*_{p} (such as church, museum, beach), longitude, and latitude, and the value on the node *p* represents the score of *POI*_{p}. *C* represents a collection of categories of all *POIs*. In the attribute (*c*_{i}, *D*_{i}) of the node *p*_{i}, *c*_{i} represents the category attribute of the *POI*, and *D*_{i} represents the distance of the *POI*. Each directed edge (*p*_{i}, *p*_{j}) represents a feasible route between two *POIs*, the number of edges is |*E*|, and the weight on the edge represents the travel time (in h) of the continuous access to the two *POI*s.

A travel route is a sequence consisting of multiple travel POIs, denoted as *R* = {*p*_{1}, *p*_{2}, …, *p*_{N}}, where *p*_{i} is the tourist location included in the route, and N is the number of locations.

The travel time between two *POIs*. The travel time required by the user from \( {POI}_{P_x} \) to \( {POI}_{P_{\begin{array}{l}y\\ {}\end{array}}} \) can be defined as follows:

$$ {T}^{Travel}\left({p}_x,{p}_y\right)= Dist\left({p}_x,{p}_y\right)/ speed $$

(1)

where *Dist*(*p*_{x}, *p*_{y}) represents the distance between p_{x} and p_{y}, which is calculated by the Haversine formula [20]. Suppose the user walks to play and takes a speed of 6 km/h.

The preference vector of a user *u* is expressed as *IntP*(*u*) = 〈*Int*(*u*, *c*_{1}), *Int*(*u*, *c*_{2}), …, *Int*(*u*, *c*_{i})〉, where *Int*(*u*, *c*_{i}) represents the degree of preference of the user *u* for the *POI* category *c*_{i}.

Given a user *u* and the *POI* collection he/she has been to define his/her historical travel footsteps in chronological order \( {S}_u=\left(\left({\mathrm{p}}_1,{t}_{p_1}^a,{t}_{p_1}^d\right),\left({p}_2,{t}_{p_2}^a,{t}_{p_2}^d\right)\dots, \left({p}_n,{t}_{p_n}^a,{t}_{p_n}^d\right)\right) \). Each triplet \( \left({p}_x,{t}_{p_x}^a,{t}_{p_x}^d\right) \) consists of a \( {POI}_{P_x} \) that the user has visited, a time \( {t}_{p_x}^a \) that reaches *p*_{x}, and a time \( {t}_{p_x}^d \) that leaves *p*_{x}, consisting of three elements. The first photo taken by the user in each *POI* is the time of the user’s arrival and the last photo is the time of the user’s departure. The user’s access time in *p*_{x} (that is, the user u’s stay in *p*_{x}) can be valued by the difference between \( {t}_{p_x}^a \) and \( {t}_{p_x}^d \). Similarly, for the travel sequence *S*_{u}, \( {t}_{p_1}^a \) and \( {t}_{p_n}^d \) represent the start and end times of the journey, respectively. For simplicity, this paper represents \( {S}_u=\left(\left({p}_1,{t}_{p_1}^a,{t}_{p_1}^d\right),\left({p}_2,{t}_{p_2}^a,{t}_{p_2}^d\right)\dots, \left({p}_n,{t}_{p_n}^a,{t}_{p_n}^d\right)\right) \) as *S*_{u} = (*p*_{1}, *p*_{2}, …, *p*_{n}).

The travel footprint *S*_{u} = (*p*_{1}, *p*_{2}, …, *p*_{n}) of the user *u* is known, and if \( {t}_{p_{x+1}}^a-{t}_{p_x}^d>\tau \), *S*_{u} is divided into a number of individual travel sequences (that is, sub-sequences of *S*_{u}). In other words, if the time between consecutive accesses of two *POIs* is greater than the threshold *τ*, the travel footprint is divided into a number of different tourist subsequences. The time threshold value *τ* is selected in this document as 8h.

This paper gives a *POI* scoring method considering location distance and user preferences. The score for \( {POI}_{p_i} \) is expressed as *score*(*p*_{i}):

$$ score\left({p}_i\right)=\alpha \times Int\left(u,{c}_i\right)+\left(1-\alpha \right)D\left({p}_i\right) $$

(2)

where *c*_{i} is the category of \( {POI}_{p_i} \), *D*(*p*_{i}) is the distance of \( {POI}_{p_i} \), and *α* is the user adjustment parameter, which is used to adjust the proportion of user interest preference and *POI* distance in the route.

### 3.2 Tourist route recommendation framework

As shown in Fig. 2, the travel route recommendation in this paper is divided into the construction of the *POI* association graph and the learning of the user’s interest preference, as well as the route recommendation. The construction of the *POI* association graph and the learning of the user’s interest preference are performed offline, and the distance of the *POI* and the user’s interest preference can be obtained by analyzing the photos taken by the user. The route recommendation is online, assuming that the user wants to go to the city with *mPOIs*, recorded as *P* = {*p*_{1}, *p*_{2}, …, *p*_{m}}. According to the *POI* set *P*, time budget *B*, starting point \( {POI}_{p_1} \), and ending point \( {POI}_{p_n} \), the route with the highest score is recommended to users by using the proposed algorithm which combines user interest preferences and POI distance based on the orientation problem.

### 3.3 Construction of POI correlation graph

The construction of the POI association diagram takes place offline. In this paper, POI in the tourism sequence of all users is used as the node in the graph, which represents the tourism place, and the continuous access of users in the tourism sequence generates the edges in the graph.

The structure of the photo data shared by the user is (PhotoID, UserID, time, longitude, latitude, category). From this structure, the photo data contains the exact space-time location information of the user. Based on the longitude and latitude of each photo, the Haversine formula [20] is used to calculate the distance between each photo shared by the user and each POI in the city visited. If the distance is less than 200m, the photograph is considered to be taken at POI, so as to obtain the list of users’ POI *S*_{u} = (*p*_{1}, *p*_{2}, …, *p*_{n}).

A time-based user interest preference is presented by using the user’s historical travel footprint. When a user goes to a POI to play, he/she will stay at the POI for a certain period of time. The access time (i.e., the stay time) of each POI that each user has visited is calculated from the historical travel footprint of all users according to definition 4, so that the average time required for any user to access any one of the POIs can be calculated. In the travel route recommendation in this article, \( \overline{V}(p) \) is used to indicate the average access time of his/her *POI*_{p} for any user. The average access time required for each *POI*_{p} is as follows:

$$ \overline{V}(p)=\frac{1}{n}\sum \limits_{u\in U}\sum \limits_{p_x\in {S}_u}\left({t}_{p_x}^d-{t}_{p_x}^a\right)\sigma \left({p}_x=p\right);\forall p\in P $$

(3)

where *U* represents all users, and *n* represents the number of users accessing *p* in *U*. \( \sigma \left({p}_x=p\right)=\left\{\begin{array}{c}1,{p}_x=p\\ {}0,\mathrm{other}\end{array}\right. \).

However, the average access time of the user at each POI does not truly reflect his/her degree of interest preference for this type of POI. Therefore, a time-based user interest preference is proposed in this paper. The preference degree of user *u* to category attribute *c* of POI is calculated from the following equation.

$$ Int\left(u,c\right)=\sum \limits_{p_x\in {S}_u}\frac{\left({t}_{p_x}^d-{t}_{p_x}^a\right)}{\overline{V}\left({p}_x\right)}\sigma \left({Cat}_{p_x}=c\right);\forall c\in C $$

(4)

where *Cat*_{p} represents the category attribute of *POIp*,\( \sigma \left({Cat}_{p_x}=c\right)=\left\{\begin{array}{c}1,{Cat}_{p_x}=c\\ {}0, other\end{array}\right. \). The above equation determines the interest of user *u* in the category attribute *c* for a particular POI. Relative to the average access time of all users in the same POI, it is calculated based on the time spent by users in each POI with category attribute*c*. In other words, a user may spend more time accessing the type of POI he or she is interested in, which in turn determines the user’s level of interest in such POI.

### 3.4 Proposed algorithm

Orienteering problem (OP) is a directional problem, which is described as follows. In a directed weighted graph G(V,E), V is the set of all points on the graph, and E is the set of all edges on the graph. Each point has its score (score, which can be expressed as gain), and each edge has its weight (weight, which is the walking time between two points). The start and end points are specified. Select partial points from G. Then, plan a path through the selected points, the starting points, and ending points. At the same time, under the premise of not exceeding a certain time budget, the total weight score of the path is maximized.

OP has been widely used in travel route recommendations. The route recommendation algorithm is proposed in consideration of POI distance and user interest based on the orientation problem. Based on the set P, the time budget B, the starting point \( {POI}_{p_1} \), and the end point \( {POI}_{p_n} \), the proposed algorithm recommends a route *R* = {*p*_{1}, *p*_{2}, …, *p*_{n}} that satisfies the time budget B and has the highest score. The time budget is calculated by function *Cost*(*p*_{x}, *p*_{y}), \( Cost\left({p}_x,{p}_y\right)={T}^{Travel}\left({p}_x,{p}_y\right)+\overline{V}\left({p}_y\right) \). Therefore, the travel route recommendation model in this paper can satisfy the integer programming problem with multiple constraints, which is expressed as follows:

$$ \mathit{\operatorname{Max}}\sum \limits_{i=2}^{N-1}\sum \limits_{j=2}^N{x}_{i,j} score\left({p}_i\right) $$

(5)

where *x*_{i, j} = 1 indicates the route from *i* to *j*, which goes through the edge (*p*_{i}, *p*_{j}), otherwise *x*_{i, j} = 0. The above equation satisfies the following constraints:

$$ \sum \limits_{j=2}^N{x}_{1,j}=\sum \limits_{i=1}^{N-1}{x}_{i,N}=1 $$

(6)

$$ \sum \limits_{j=2}^N{x}_{k,j}=\sum \limits_{i=1}^{N-1}{x}_{i,k}\le 1;\forall k=2,3,\dots, N-1 $$

(7)

$$ \sum \limits_{i=1}^{N-1}\sum \limits_{j=2}^N Cost\left(i,j\right){x}_{i,j}\le B $$

(8)

$$ 2\le {u}_i\le N $$

(9)

$$ {u}_i-{u}_j+1\le \left(N-1\right)\left(1-x\right);\forall i,j=2,3,\dots, N $$

(10)

Equation (5) is an objective function that maximizes the POI distance and user interest preferences in the recommended route. Eqs. (6) to (10) are the constraints of Eq. (5). Equation (6) ensures that the user’s travel starting point is *p*_{1} and the end point is *p*_{n}. Equation (7) ensures that the user’s travel route is coherent and that each POI in the route has only been visited once. Equation (8) guarantees that the time spent by the user on the entire trip is within budget *B*. Assuming that *u*_{x} is the location of *POIx* in route *R*, Eq. (9) and Eq. (10) ensure that there are no sub-patrol routes in the integer programming problem proposed in this paper.